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EXECUTIVE  SUMMARY 


I’he  Traffic  Alert  and  Collision  Avoidance  System  (TCAS)  is  a  widely-deployed 
safety  system  for  reducing  the  risk  of  mid-air  collision  between  aircraft.  TCAS  II 
provides  advisories  to  pilots  on  how  to  resolve  potential  conflicts.  The  threat  res¬ 
olution  logic  of  TCAS  II  is  the  product  of  several  decades  of  research  by  many 
organizations.  The  logic  was  tested  in  simulation  on  a  large  collection  of  encounter 
scenarios  generated  by  models  derived  from  operational  data.  The  system  was  eval¬ 
uated  using  various  performance  metrics,  including  collision  risk  and  false  alert  rate. 
The  development  cycle  involved  analyzing  problematic  encounters  and  ada[)ting  th(' 
logic  manually,  which  w^as  then  re-evaluated  in  simulation. 

As  the  airspace  evolves  with  the  introduction  of  new  air  traffic  control  pro¬ 
cedures  and  surveillance  systems,  it  is  likely  that  the  TCAS  II  threat  detection 
and  resolution  logic  will  recjuire  modification  to  meet  safety  and  operational  re- 
(luirements.  Due  to  the  complexity  of  the  logic,  modifying  the  logic  may  require 
significant  engineering  effort.  This  report  suggests  a  new  approach  to  TCAS  logic 
development  where  the  engineering  effort  is  focused  on  developing  models,  allow¬ 
ing  computers  to  optimize  the  logic  according  to  agreed-upon  performance  metrics. 
Because  models  of  sensor  characteristics,  pilot  response  behavior,  and  aircraft  dy¬ 
namics  can  be  constructed  from  operational  data,  they  should  l)e  straightforward  to 
justify  and  vet  within  the  safety  comimmity.  The  optimization  of  the  logic  according 
to  these  models  would  be  done  using  principled  techniques  that  are  well  established 
in  theory  and  practice  over  the  past  50  years. 

The  objective  of  this  report  is  not  to  develop  a  particular  conflict  resolution 
algorithm,  but  to  connect  this  concept  of  TCAS  logic  optimization  to  the  exist¬ 
ing  literature  on  model-based  optimization.  Problems  involving  se(iuential  decision 
making  in  a  dynamic  environment  are  typically  modeled  by  a  Markov  decision  pro¬ 
cess,  where  the  state  at  the  next  decision  point  depends  probabilistically  on  the 
current  state  and  the  chosen  action.  Assuming  some  objective  measure  of  cost,  the 
best  action  from  the  current  state  is  the  one  that  minimizes  the  expected  future 
cost.  Dynamic  programming  can  be  used  to  solve  for  the  optimal  action  from  all 
possible  states. 

To  illustrate  some  of  the  key  concepts  of  how  dynamic  programming  might 
be  applied  to  TCAS  logic  optimization,  this  report  uses  a  simple  encounter  model 
and  evaluates  the  resulting  logic  in  simulation  using  various  performance  metrics. 
This  report  identifies  some  of  the  issues  with  applying  a  dynamic  programming 
approach.  One  issue  is  the  scalability  of  existing  solution  methods  to  higher  dimen¬ 
sions.  Adding  additional  dimensions  results  in  an  exponential  increase  in  memory 
and  computational  requirements,  but  several  techniques  suggested  in  this  report 
address  this  issue. 
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This  report  discusses  alternatives  to  dynamic  programming.  One  approach 
that  has  been  explored  by  others  involves  using  conflict  probability  estimates  to 
decide  when  to  issue  resolution  advisories.  Although  this  approach  will  not  result  in 
the  optimal  solution,  it  may  approximate  the  optimal  logic  well.  This  report  also  dis¬ 
cusses  the  strengths  and  weaknesses  of  other  approaches,  such  as  rapidly-expanding 
random  trees,  potential  field  methods,  policy  search,  and  geometric  optimization. 
One  of  the  primary  strengths  of  the  dynamic  programming  approach  over  these  other 
methods  is  that  it  directly  leverages  models  of  sensor  error  and  aircraft  behavior  to 
find  the  optimal  logic. 

Although  this  report  focuses  primarily  on  the  computational  aspect  of  opti¬ 
mizing  collision  avoidance  logic,  there  are  other  issues  that  require  further  study. 
In  particular,  since  this  is  a  new  approach  to  TCAS  logic  development,  the  certifi- 
ability  of  the  resulting  logic  is  of  particular  concern.  If  this  new  approach  is  to  be 
used  simply  as  an  aid  to  engineers  who  are  developing  or  revising  collision  avoidance 
pseudocode,  then  there  would  be  little  impact  on  the  certification  process.  However, 
if  the  logic  produced  by  dynamic  programming  or  some  other  automated  process  is 
to  be  used  directly  in  a  future  version  of  TCAS,  then  the  certification  process  may 
be  somewhat  different.  The  core  of  the  certification  process  will  be  the  same,  in¬ 
volving  rigorous  simulation  studies  and  flight  tests  to  prove  safety  and  demonstrate 
operational  acceptability.  However,  the  vetting  of  the  logic  itself  will  involve  more 
than  just  studying  the  logic  that  will  be  deployed  on  the  system.  Depending  on  the 
representation  of  the  logic,  it  may  not  be  directly  comprehensible  by  an  engineer. 
Therefore,  confidence  would  need  to  l)e  established  in  the  safety  community  that 
the  methods  used  to  generate  the  logic  are  sound.  This  report  represents  a  first  step 
in  justifying  an  automated  approach  for  generating  optimized  TCAS  logic. 
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1.  INTRODUCTION 


The  Traffic  Alert  and  Collision  Avoidance  System  (TCAS)  is  designed  to  reduce  the  rate  of 
mid-air  collisions  between  aircraft.  TCAS  I,  intended  primarily  for  general  aviation  aircraft,  pro¬ 
vides  traffic  advisories  (TAs)  to  pilots.  In  addition  to  TAs,  TCAS  II  provides  resolution  advisories 
(RAs)  that  instruct  pilots  on  how  to  resolve  potentially  hazardous  situations.  The  threat  resolution 
logic  of  TC  AS  II  has  been  shown  to  significantly  reduce  the  risk  of  collision  when  other  safety  layers, 
such  as  air  traffic  control  services,  have  failed  to  maintain  safe  se{)aration  between  aircraft  [1-4]. 
TCAS  II  is  currently  mandated  worldwide  on  board  all  large  transport  aircraft. 

The  threat  declaration  and  resolution  logic  in  the  current  version  of  TCAS  is  the  product  of 
several  decades  of  research  by  different  organizations.  During  the  development  of  TCAS,  the  logic 
was  tested  in  simulation  on  a  large  collection  of  encounter  scenarios.  The  encounters  were  generated 
randomly  from  models  derived  from  operational  data.  The  performance  of  TCAS  was  evaluated 
using  various  performance  metrics,  including  collision  risk  and  false  alarm  rate.  The  development 
cycle  involved  analyzing  problematic  encounters  and  adapting  the  logic  manually,  which  was  then 
re-evaluated  in  simulation. 

Due  to  the  complexity  of  TCAS,  modifying  the  logic  to  correct  issues  identified  in  simu¬ 
lation  can  be  difficult.  For  example,  recent  efforts  to  correct  an  issue  with  the  critical  interval 
portion  of  the  logic  were  complicated  due  to  unanticipated  ripples  in  other  parts  of  the  logic. 
Next-generation  air  traffic  control  procedures  and  new  sensor  systems  like  Automatic  Dependent 
Surveillance-Broadcast  (ADS-B)  will  likely  require  re-engineering  much  of  the  system  and  tuning 
many  [)arameters  embedded  in  the  logic. 

This  report  investigates  a  decision-theoretic  approach  to  developing  collision  avoidam^e  logic 
that  directly  leverages  encounter  models  to  optimize  threat-resolution  behavior.  In  this  framework, 
the  optimal  resolution  advisory  is  the  one  that  provides  the  best  expected  outcome,  balancing  the 
competing  objectives  of  minimizing  unnecessary  alerts  and  collisions.  The  engineering  effort  is 
focused  on  building  models  instead  of  designing  complex  logic. 

Since  the  develo[)ment  of  TCAS,  there  have  been  significant  technological  and  algorithmic 
advances  that  may  make  an  automated  logic  optimization  approach  practical.  The  theory  of  optimal 
decision  making  under  uncertainty  has  been  applied  to  a  wide  variety  of  problems,  from  robotic 
control  to  medical  diagnosis.  Leveraging  these  advances  in  theory  and  practice  has  the  potential 
of  shortening  the  TCAS  development  cycle,  reducing  unnecessary  alerts,  and  making  the  system 
more  rol)ust  to  unexpected  events.  As  the  airspace  and  sensor  capabilities  evolve,  the  system  can 
be  re-optimized  based  on  u[)dated  models  with  relatively  little  engineering  effort. 

1.1  OBJECTIVES 

The  objectives  of  this  report  are  as  follows: 

1.  Provide  motivation:  A  model- based  optimization  approach  to  collision  avoidance  is  a 

significant  departure  from  how  the  TCAS  logic  has  been  developed  in  the  past.  This  report 
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will  identify  the  strengths  and  challenges  associated  with  such  an  approach  and  discuss  why 
consideration  of  a  new  approach  may  be  worthwhile.  (Section  2) 

2.  Connect  problem  to  existing  literature:  A  large  body  of  literature  exists  on  optimal 
decision  making  under  uncertainty,  and  there  has  been  substantial  progress  made  in  both 
theory  and  practice  in  recent  years  that  has  not  been  fully  leveraged  by  the  TCAS  development 
community.  This  report  will  connect  the  problem  of  collision  avoidance  to  the  key  concepts, 
models,  and  solution  methods  developed  in  the  field.  (Section  3) 

3.  Demonstrate  concepts  on  simple  model:  Because  of  the  complexity  of  a  realistic  model  of 
the  airborne  collision  avoidance  problem,  this  report  focuses  on  a  simplified  model  (described 
later  in  this  section)  that  has  many  of  the  important  attributes  of  the  real  problem  but  can 
be  discussed  more  easily.  This  simplified  model  will  be  used  to  demonstrate  the  key  concepts 
of  the  approach.  (Section  3) 

4.  Discuss  performance  metrics:  Many  of  the  performance  metrics  used  to  evaluate  previous 
versions  of  TCAS  can  be  applied  to  measure  the  performance  of  a  system  developed  using  the 
proposed  approach.  This  report  will  compare  the  performance  of  the  new  approach  against 
the  existing  logic  using  several  different  metrics.  (Section  5) 

5.  Compare  with  alternative  approaches:  A  wide  variety  of  other  approaches  to  airborne 
collision  avoidance  have  been  suggested  in  the  literature.  This  report  will  briefly  survey  some 
of  the  most  significant  methods  and  relate  them  to  the  proposed  decision-theoretic  approach. 
(Sections  4  and  6) 

6.  Identify  issues  for  further  research:  Further  research  will  be  required  to  scale  the  simple 
model  and  solution  methods  to  a  working  prototype  system.  This  report  will  focus  largely  on 
the  computational  issues  l)ut  will  also  discuss  other  issues  related  to  certification.  (Section  7) 

The  scope  of  this  report  includes  only  the  threat-resolution  behavior  of  TCAS  II.  This  report 
will  only  discuss  the  conditions  for  issuing  resolution  advisories,  not  the  conditions  for  issuing  traffic 
advisories.  The  approach  suggested  in  this  rei3ort  can  accommodate  different  surveillance  systems, 
including  Mode  S  and  ADS-B,  but  the  discussion  in  most  of  this  report  does  not  depend  upon  the 
specifics  of  the  surveillance  system. 

This  report  does  not  advocate  any  specific  collision  avoidance  logic — only  an  approach  that 
can  lead  to  the  development  of  new  logic  or  suggestions  for  improving  existing  logic.  The  develop¬ 
ment  of  a  new  logic  will  require  the  participation  and  consensus  of  many  different  organizations,  and 
the  certification  process  will  involve  rigorous  simulation  studies  and  flight  tests  as  done  historically 
with  previous  versions  of  TCAS. 

The  remainder  of  this  section  provides  a  short  introduction  to  the  existing  TCAS  logic  and 
describes  the  hypothetical  collision  avoidance  problem  used  to  demonstrate  the  approach  suggested 
in  this  report.  Throughout  this  report,  the  aircraft  equipped  with  TCAS  is  called  own  aircraft^ 
while  a  potentially  hazardous  nearby  aircraft,  which  is  not  necessarily  equipped  with  TCAS,  is 
called  the  intruder. 
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1.2  TCAS  LOGIC 


TCAS  decides  to  issue  RAs  based  upon  range  and  altitude  separation  criteria.  If  it  is  projected  that 
range  separation  with  an  intruder  might  be  lost  within  a  threshold  time  and  that  there  might  not 
be  at  least  a  minimum  altitude  separation  during  the  time  interval  when  range  separation  might 
be  lost,  an  RA  will  be  triggered.  The  timing  of  an  RA  depends  on  the  values  of  various  parameters 
used  in  the  resolution  logic.  The  values  of  these  parameters,  in  turn,  depend  on  the  altitude  layer 
of  own  aircraft  and  the  so-called  sensitivity  level.  Higher  sensitivity  levels  generally  result  in  earlier 
RAs.  When  the  sensitivity  level  is  set  to  two,  its  lowest  value,  the  generation  of  RAs  is  inhibited. 
During  normal  operation,  the  sensitivity  level  is  most  often  determined  automatically  tmsed  on  the 
altitude  of  own  aircraft. 

TCAS  issues  either  upward  or  downward  sense  RAs.  The  Climb  and  Descend  RAs  advise 
the  pilot  to  climb  or  descend  at  least  1500  ft /min,  respectively.  These  RAs  are  called  positive 
RAs  because  they  require  positive  action  on  the  part  of  the  pilot  in  order  to  resolve  the  conflict. 
They  generally  require  the  most  aggressive  maneuvers  from  the  pilot.  Other  positive  RAs  include 
Crossing  Climb,  when  own  aircraft  is  projected  to  cross  altitudes  with  the  intruder  while  climbing, 
and  Maintain  Climb,  when  own  aircraft  is  already  climbing  at  least  1500ft/min.  In  addition  to  these, 
TCAS  also  issues  RAs  that  correspond  to  vertical  speed  limits  (VSLs),  such  as  Do  Not  Climb  and 
Do  Not  Climb  >  500ft/min.  The  least  restrictive,  or  weakest,  VSL  is  Do  Not  Climb  >  2000ft/min, 
while  the  most  restrictive,  or  strongest,  VSL  is  Do  Not  Climb  (and  their  upward  sense  counterparts). 
The  VSLs  are  also  called  negative  RAs  because  they  only  require  that  the  pilot  not  do  something 
to  avoid  conflict.  Table  1  is  a  list  of  all  possible  RAs  that  TCAS  can  initially  issue  to  the  pilot. 
Depending  on  the  development  of  an  encounter,  TCAS  can  modify  the  strength  of  the  R  A  or  issue 
sense  reversals  and  increase  rate  RAs  if  it  predicts  that  the  vertical  separation,  perhaps  due  to 
neglect  of  the  issued  RA,  will  be  insufficient  in  the  future. 

The  TCAS  logic  consists  of  several  components:  threat  detection,  initial  sense  selec  tion,  initial 
strength  selection,  and  encounter  monitoring  and  RA  modification.  The  following  sections  outline 
the  behavior  of  these  components;  further  detail  can  be  found  elsewhere  [5,6]. 

1.2.1  Threat  Detection 

The  threat  detection  component  of  the  logic  determines  whether  any  nearby  altitude-re])orting 
aircraft  are  potential  collision  threats.  TCAS  decides  to  issue  RAs  based  on  the  projected  time 
until  the  closest  point  of  approach  (CPA)  in  range  and  in  altitude.  These  quantities  are  called 
the  range  tan  and  the  vertical  tan,  respectively.  The  range  tau  is  equal  to  the  slant  range  to  the 
intruder  divided  by  the  closure  rate.  It  is  the  amount  of  time  required  to  reach  zero  separation 
assuming  constant  closure  rate  for  the  remainder  of  the  encounter.  The  vertical  tau,  similarly,  is 
equal  to  the  altitude  separation  divided  by  the  relative  vertical  rate  between  the  aircraft. 

To  account  for  the  possibility  that  the  intruder  might  accelerate  toward  own  aircraft  at  some 
time  in  the  future,  thereby  shortening  the  time  to  CPA,  a  second  quantity  is  defined  called  modified 
tau,  which  is  always  less  than  tau.  Modified  tau  assumes  a  particular  model  for  future  range 
acceleration  by  the  intruder  toward  own  aircraft.  It  is  approximately  equal  to  the  time  until  the 
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TABLE  1 


List  of  initial  RAs,  adapted  from  [5]. 


RA  Type 

Upward  Sense 

Downward  Sense 

RA 

Target  Rate  (ft/m in) 

RA 

Target  Rate  (ft/min) 

Positive 

Climb 

1500  to  2000 

Descend 

-1500  to  -2000 

Positive 

Crossing  Climb 

1500  to  2000 

Crossing  Descend 

-1500  to  -2000 

Positive 

Maintain  Climb 

1500  to  4400 

Maintain  Descend 

-1500  to  -4400 

Negative 

Do  Not  Descend 

>  0 

Do  Not  Climb 

<  0 

Negative 

Do  Not  Descend  >  500 

>  -500 

Do  Not  Climb  >  500 

<  500 

Negative 

Do  Not  Descend  >  1000 

>  -1000 

Do  Not  Climb  >  1000 

<  1000 

Negative 

Do  Not  Descend  >  2000 

>  -2000 

Do  Not  Climb  >  2000 

<  2000 

intruder  aircraft  is  projected  to  penetrate  a  safety  buffer  around  own  aircraft,  defined  as  a  sphere 
of  radius  DMOD,  and  is  zero  when  the  intruder  range  is  less  than  or  equal  to  DMOD.  Modified  tau 
also  addresses  another  problem  with  using  tau  alone  to  define  the  time  until  CPA  and  to  trigger 
RAs:  at  very  slow  closure  rates,  the  intruder  can  get  very  close  to  own  aircraft  before  tau  becomes 
small  enough  to  trigger  an  RA.  The  interval  between  modified  tau  and  tau  is  called  the  critical 
interval  and  is  taken  to  be  the  interval  of  time  during  which  horizontal  separation  could  be  lost. 

Additional  constraints  are  placed  on  tau  and  modified  tau  to  avoid  problems  that  arise  if  the 
intruder  is  not  on  a  direct  collision  course  (which  TCAS  cannot  always  determine  based  on  range 
and  range  rate  measurements)  or  if  the  intruder  is  closing  very  slowly.  In  the  former  case,  tau  and 
modified  tau  will  reach  a  miniiiiiim  and  begin  to  increase  prior  to  CPA,  giving  a  false  estimate  of 
the  remaining  time  until  CPA.  In  the  latter  case,  their  values  can  become  unreasonably  large.  To 
prevent  this,  the  critical  interval  is  limited  to  a  maximum  time  interval  of  interest. 

If  the  intruder  is  diverging  in  altitude,  vertical  tau  is  not  meaningful.  In  that  case,  the 
decision  as  to  whether  to  issue  an  RA  is  based  on  whether  the  projected  vertical  separation  during 
the  critical  interval  exceeds  a  iiiiiiimum  threshold. 

Before  being  declared  a  threat,  the  intruder  aircraft  must  pass  both  range  and  altitude  tests. 
The  range  test  checks  whether  modified  tau  is  less  than  a  certain  time  threshold.  The  time  thresh¬ 
old,  called  TRTHR,  is  typically  larger  at  higher  altitudes.  As  part  of  the  range  test,  TCAS  also 
performs  several  nuisance  alarm  tests  to  prevent  intruders  that  have  large  horizontal  miss  distances 
from  being  declared  threats.  It  also  may  delay  threat  detection  if  it  appears  likely  that  by  doing 
so  it  can  avoid  issuing  an  RA  that  forces  the  aircraft  to  cross  altitudes. 

The  altitude  test  depends  on  whether  the  vertical  separation  is  currently  less  than  a  threshold. 
This  threshold,  called  ZTHR,  is  a  design  parameter  of  the  logic  and  is  a  function  of  the  altitude 
layer.  If  the  intruder’s  vertical  separation  is  less  than  ZTHR,  the  intruder  passes  the  altitude  test  if 
the  projected  vertical  separation  during  the  critical  interval  is  also  less  than  ZTHR.  If  the  current 
vertical  separation  is  greater  than  ZTHR,  the  altitude  test  is  passed  if  the  intruder  is  converging  in 
altitude  and  vertical  tau  is  less  than  a  threshold  called  TVTHR.  TVTHR  is  typically  larger  at  higher 


4 


altitudes  and  is  typically  the  same  as  the  range  test  threshold  TRTHR.  During  threat  detection, 
future  aircraft  altitudes  are  predicted  using  linear  extrapolation. 

1.2.2  Initial  Sense  Selection 

If  TCAS  declares  an  intruder  a  threat,  it  proceeds  to  select  the  sense  of  the  initial  RA  to  issue 
the  pilot.  The  sense  can  either  be  up  or  down,  depending  on  the  maneuver  performed  as  a  result  of 
executing  the  RA.  The  up  sense  indicates  that  own  aircraft  is  expected  to  pass  above  the  intruder 
as  a  result  of  RA  execution,  the  down  sense  that  own  aircraft  is  expected  to  pass  below.  If  the 
sense  is  anticipated  to  cause  the  aircraft  to  cross  in  altitude,  the  sense  is  termed  an  altitude-crossing 
sense.  TCAS  is  strongly  biased  against  selecting  altitude-crossing  R As. 

Often  the  sense  selection  is  determined  by  the  intruder,  if  it  is  TCAS-equipped  and  happens 
to  declare  own  aircraft  a  threat  and  performs  sense  selection  first.  In  that  case  it  will  send  a  sense- 
coordination  message  to  own  aircraft.  If  the  sense  of  the  intruder  RA  is  not  altitude-crossing,  own 
TCAS  will  select  the  complementary  sense  for  its  RA.  However,  if  the  sense  transmitted  by  the 
intruder  is  an  altitude-crossing  sense,  TCAS  may  independently  select  its  sense,  })rovided  certain 
conditions  are  met  e.g.,  the  Mode  S  address  of  own  aircraft  must  be  smaller  than  that  of  the 
intruder,  among  others.  If  it  selects  the  sense  that  is  not  complementary  to  that  selected  by  the 
intruder,  its  sense-coordination  message  to  the  intruder  will  cause  the  intruder  to  reverse  its  sense 
selection. 

In  the  event  that  TCAS  dotoriiiines  its  own  RA  sense,  it  models  the  response  to  both  Climb 
and  Descend  RAs  and  calculates  the  projected  vertical  separation  during  the  critical  interval.  As 
in  the  threat  detection  component,  TCAS  models  the  intruder’s  trajectory  as  a  straight  line  with  a 
constant  vertical  rate.  As  for  own  aircraft,  TCAS  implements  a  pilot  delay  model  in  which  the  pilot 
ref|uires  five  seconds  to  respond  to  an  initial  RA.  After  the  pilot-response  delay,  the  own  aircraft 
is  modeled  as  accelerating  at  0.25  g  until  reaching  the  target  vertical  rate,  after  which  it  maintains 
that  rate  for  the  remainder  of  the  encounter.  The  target  vertical  rate  is  15n0ft/min  for  a  Climb  RA 
and  — I500ft/min  for  a  Descend  RA.  If  own  vertical  rate  is  greater  than  1500ft/niin  in  the  sense 
direction,  the  target  vertical  rate  is  the  current  vertical  rate  of  own  aircraft  in  that  direction  up  to 
a  maximum  modeled  target  vertical  rate  of  4400ft/min. 

TCAS  issues  RAs  to  ensure  that  the  aircraft  maintain  at  least  the  desired  minimum  vertical 
separation  during  the  critical  interval.  The  desired  minimum  vertical  separation,  called  ALIM,  is  a 
design  parameter  of  the  logic.  It  is  a  function  of  the  altitude  layer  and  is  less  than  the  threshold 

ZTHR. 

TCAS  will  select  the  non-altitude-crossing  sense,  provided  the  aircraft  are  not  within  100  ft 
of  each  other  vertically,  if  the  projected  vertical  separation  is  at  least  equal  to  ALIM.  Additionally, 
an  altitude  crossing  must  not  be  projected  to  be  inevitable.  If  these  conditions  are  not  met,  TCAS 
selects  the  sense  that  provides  the  greater  separation.  In  the  case  that  both  senses  provide  the 
same  separation,  TCAS  always  selects  the  down  sense.  Note  that  if  the  non-altitude-crossiiig  sense 
provides  at  least  the  minimiini  separation,  TCAS  selects  this  sense  over  the  altitude-crossing  sense 
even  if  the  latter  provides  more  separation. 
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Time  of 


Figure  1.  Initial  RA  sense  selection  process. 

Figure  1  is  an  illustration  of  how  RA  sense  selection  is  performed.  Because  the  up-sense 
RA  is  an  altitude-crossing  RA,  TCAS  will  select  the  down-sense  RA  if  d2  >  ALIM  and  the  other 
conditions  of  the  previous  paragraph  are  met.  Otherwise,  TCAS  will  select  the  sense  that  provides 
the  greater  separation,  which  again  hapj)ens  to  be  the  down  sense  because  d2  >  di. 

1.2.3  Initial  Strength  Selection 

The  initial  sense  selection  component  determines  whether  a  Climb  or  Descend  RA,  depending 
on  the  sense  selected,  is  projected  to  provide  at  least  the  minimum  separation,  or  at  least  more 
se[)aration  than  the  opposite  sense.  However,  there  may  exist  a  vertical  speed  limit  of  the  same 
sense  that  provides  sufficient  separation  and  is  less  disruptive  to  the  own  aircraft  trajectory.  When 
no  prior  RA  has  been  issued  against  a  particular  intruder,  the  strength  selection  component  of  the 
logic  determines  the  weakest  VSL  that  achieves  at  least  the  desired  separation  during  the  critical 
interval.  If  none  of  the  VSLs  provides  sufficient  sei)aration,  the  logic  will  select  the  positive  Climb 
or  Descend  RA  as  appropriate  for  the  selected  sense. 

1.2.4  Encounter  Monitoring  and  RA  modifications 

TCAS  continues  to  monitor  the  development  of  an  encounter  and,  if  necessary,  modifies  the 
initial  RA  that  it  issued.  TCAS  re-evaluates  the  effect  of  the  current  RA  strength  during  every 
update  cycle  after  initial  RA  selection.  If  the  selected  RA  is  a  VSL,  it  will  be  strengthened  to  a 
stronger  VSL  or  to  a  positive  Climb  or  Descend  if  necessary  to  ensure  adequate  separation  during 
the  critical  interval.  The  strength  of  an  RA  is  never  reduced  except  when  a  Climb  or  Descend  RA 
can  be  weakened  to  a  Do  Not  Descend  or  Do  Not  Climb  VSL  when  own  aircraft  can  safely  level  off. 

If  a  Climb  or  Descend  R  A  has  been  issued  against  a  certain  intruder  but  subsequent  update 
cycles  indicate  insufficient  vertical  separation,  the  logic  performs  tests  to  determine  the  feasibility  of 
reversing  the  sense  of  the  R  A.  If  the  reversal  tests  do  not  permit  sense  reversal,  the  Climb  or  Descend 
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Figure  2.  Hypothetical  collision  avoidance  problem, 

RAs  may  be  strengthened  to  Increase  Climb  or  Increase  Descend  RAs,  respectively,  under  certain 


conditions.  These  increase  rate  RAs  advise  the  pilot  to  climb  or  descend  at  least  2500ft/min. 


The  pilot-response  delay  assumed  l)y  TCAS  in  projecting  the  results  of  all  RA  modifications 
is  2.5  seconds.  For  sense  reversals  and  increase  rate  RAs,  own  aircraft  is  assumed  to  accelerate  at 
0.35  g  until  reaching  the  target  vertical  rate. 

1.3  HYPOTHETICAL  COLLISION  AVOIDANCE  PROBLEM 

This  report  uses  a  relatively  simple,  hypothetical  collision  avoidance  problem  to  demonstrate  the 
concept  of  model- based  logic  optimization.  There  is  a  single  unequipped  intruder  approaching  with 
a  constant  closure  rate.  The  collision  avoidance  system  on  the  own  aircraft  may  issue  one  of  two 
RAs  that  correspond  to  climb  and  descend  vertical  maneuvers.  The  Climb  RA  advises  the  pilot 
to  climb  at  least  15()0ft/min.  The  Descend  RA  advises  the  pilot  to  descend  at  least  15n0ft/min. 
When  an  RA  is  issued,  it  takes  the  pilot  five  seconds  to  respond,  after  which  the  p)ilot  applies  a 
0.25 g  acceleration  to  meet  the  minimum  desired  vertical  rate.  If  the  pilot  is  already  complying 
with  the  RA  (e.g.,  descending  2500ft/min  when  a  descend  RA  is  issued),  no  acceleration  is  applied. 
The  intruder  vertical  acceleration  follows  a  white  noise  model  with  a  standard  deviation  of  1  ft/s^. 
When  the  own  aircraft  is  not  responding  to  an  RA,  its  vertical  acceleration  follows  the  same  noise 
model  as  the  intruder.  Figure  2  summarizes  the  properties  of  the  problem. 

The  state  variables  are  as  follows; 

•  /i,  the  altitude  of  intruder  relative  to  own, 

•  T,  the  time  to  closest  horizontal  approach, 

•  /ii,  own  vertical  rate, 

•  /i2,  intruder  vertical  rate,  and 

•  RA  state  variable. 
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TABLE  2 


Distribution  of  aircraft  start  states  in  the  hypothetical  collision  avoidance  problem. 


Variable 

Range 

Distribution 

h  (ft) 

(-500,500] 

uniform 

r  (s) 

20 

— 

hi  (ft/miii) 

[-1000, 1000] 

uniform 

/i2  (ft/min) 

[-1000, 1000] 

uniform 

5RA 

clear  of  conflict 

— 

In  this  problem,  a  conflict  occurs  when  r  =  0  and  |/?  |  <  100  ft.  Appendix  A  provides  a  mathematical 
specification  for  the  probabilistic  dynamics  of  these  variables. 

The  initial  state  is  selected  randomly  according  to  the  distribution  specified  in  Table  2.  All 
encounters  start  20  seconds  prior  to  closest  horizontal  approach.  The  values  for  /i,  /^i,  and  h2  are 
chosen  uniformly  within  the  specified  ranges.  No  RA  was  previously  issued  prior  to  the  start  of  the 
encounter,  hence  the  clear  of  conflict  RA  state.  The  distribution  was  chosen  so  that  the  system  on 
own  aircraft  alerts  at  a  relatively  frequent  rate. 

Although  the  model  contains  five  state  variables,  it  only  adequately  represents  motion  in  a 
head-on  encounter  in  two  spatial  dimensions.  In  three  spatial  dimensions,  aircraft  may  be  ade¬ 
quately  separated  horizontally  at  the  time  of  closest  approach  (r  =  0).  It  is  possible  to  extend 
this  model  into  three  spatial  dimensions  by  changing  r  to  mean  the  time  to  horizontal  conflict 
and  modeling  the  influence  of  horizontal  motion  on  r  (Section  7.1).  Recently,  several  probabilistic 
models  have  been  developed  based  on  radar  data  [7-10].  These  can  be  integrated  into  the  current 
framework. 


1.4  OVERVIEW 

This  section  outlined  the  objectives  of  this  report,  briefly  outlined  the  TCAS  logic,  and  described 
a  simple  encounter  model  to  be  used  as  a  running  example  to  illustrate  the  decision-theoretic 
approach  introduced  in  this  report.  The  remainder  of  this  report  proceeds  as  follows. 

Section  2  formulates  collision  avoidance  as  an  optimization  problem.  It  begins  by  formally 
specifying  the  performance  metric  to  be  optimized  in  terms  of  conflict  and  alerting  probabilities. 
Because  sensor  information  is  imperfect  and  the  future  behavior  of  the  aircraft  involved  in  the 
encounter  cannot  be  known  perfectly,  the  optimal  logic  will  need  to  leverage  models  of  uncertainty 
in  observation  and  state  dynamics.  This  section  connects  the  requirements  of  an  optimal  logic  to 
the  body  of  work  on  Markov  decision  processes  that  can  be  solved  using  dynamic  programming. 

Section  3  discusses  how  dynamic  programming  can  be  used  to  find  the  optimal  logic.  It 
presents  a  dynamic  programming  algorithm  called  fitted  value  iteration  that  involves  discretizing  the 
state  space  and  using  a  local  averaging  scheme  to  interpolate  between  the  discrete  states.  Because 
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the  number  of  discrete  states  can  become  prohibitively  large,  methods  to  reduce  the  planning 
complexity  are  introduced. 

Section  4  presents  an  alternative  method  for  developing  collision  avoidance  logic  that  relies 
on  online  sample-based  estimates  of  the  probability  of  future  conflict  between  aircraft.  V^arious 
methods  for  estimating  the  probability  of  conflict  are  introduced.  A  variance-reduction  technique 
known  as  importaiu'e  sampling  is  shown  to  provide  more  accurate  estimates  using  fewer  samples 
than  the  naive  approach.  Several  ways  of  using  the  probability  of  conflict  in  the  development  of 
collision  avoidance  logic  are  discussed. 

Section  5  discusses  performance  metrics  and  summarizes  the  results  of  the  prcfliminary  evalua¬ 
tion  of  the  logic  developed  in  Sections  3  and  4.  FVrformance  tradeoffs  inherent  in  the  development  of 
collision  avoidance  systems  are  analyzed  through  the  use  of  system  operating  characteristic  curves. 
This  section  also  focuses  on  the  instrumental  role  these  curves  play  in  the  effective  placement  of 
system  parameters.  The  optimal  logic  is  evaluated  in  a  safety  assessment  tool  developed  at  Linc  oln 
Laboratory  and  used  in  prior  TCAS  safety  studies.  The  results  of  this  preliminary  evaluation  are 
presented. 

Section  6  describes  a  collection  of  alternative  approaches  suggested  in  the  literature  for  conflict 
avoidance,  including  potential  field  methods,  rapidly  exploring  random  trees,  gc^c^metric  optimiza¬ 
tion,  and  policy  search.  Advantages  and  disadvantages  of  each  approach  are  discussed. 

Section  7  outlines  several  areas  of  further  research.  The  simple  collision  avoidance  model 
introduced  earlier  needs  to  be  extended  in  several  different  ways.  For  example,  t  he  model  will  iuhxI 
to  be  adapted  to  support  advisory  strengthening  and  reversal,  equipped  intruders,  and  nondeter- 
ministic  pilot  response.  Further  investigation  is  also  recpiired  into  polic'y  representation  and  issues 
of  model  robustness. 

Section  8  concludes  the  report. 

There  are  nine*  appendices  that  further  elaborate  on  concepts  introduc'ed  in  the  body  of  this 
report.  Notation  is  local  to  the  section  or  appendix  in  which  it  is  introduced,  except  for  the 
notation  used  for  the  state  variables  (Section  1.3)  and  for  dynamic  programming  (Section  3),  which 
are  summarized  in  a  table  following  the  references. 
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This  f)age  inlenlionally  left  blank. 


2.  PROBLEM  FORMULATION 


Before  attempting  to  find  an  optimal  collision  avoidance  system  logic,  it  is  important  to  define 
the  metric  l)y  which  collision  avoidance  systems  are  to  be  measured  and  compared.  Section  2.1 
describes  a  {performance  metric  that  may  be  used.  Logic  that  o{Ptiniizes  {Perforinaiice  must  take 
into  account  state  uncertainty  and  dynamic  uncertainty,  both  of  which  are  discussed  in  Sections  2.2 
and  2.3.  Section  2.4  discusses  liow  to  use  models  of  state  and  dynamic  uncertainty  to  make  o{Ptinial 
decisions.  Further  discussion  of  some  of  the  points  introduced  in  these  earlier  sections  is  reserved 
for  Section  2.5. 

2.1  PERFORMANCE  METRIC 

The  performance  metric  needs  to  take  into  account  the  competing  objectives  of  |3reventing  conflict 
and  minimizing  alert  rate  (where  '‘alert'’  refers  to  an  RA,  not  a  TA,  for  the  {purposes  of  this  re{Port). 
Other  objectives  may  also  be  taken  into  consideration,  such  as  flight-{plan  deviation.  Prior  TCAS 
studies  defined  a  conflict  as  a  loss  of  separation  100  ft  vertically  and  500  ft  horizontally  [4].  Such 
conflicts  have  been  called  near  mid-air  collisions  (NMACs).  Although  this  re{Port  focuses  on  conflicts 
with  other  aircraft,  in  general  conflicts  could  involve  other  forms  of  failure,  such  as  collision  with 
terrain. 

One  way  to  balance  conflict  prevention  with  alert  ininiinization  is  to  define  a  cost  metric  that 
is  a  function  of  whether  a  conflict  or  an  alert  occurred.  Let  C{u;)  be  one  if  a  conflict  occurred 
during  encounter  a;  and  zero  otherwise,  and  let  A{u;)  be  one  if  the  encounter  involved  an  alert  and 
zero  otherwise.  The  cost  associated  with  a  {particular  encounter  uj  may  be  denoted 

c(a;)  =  C{u;)  -h  AA(a;).  (1) 

The  scalar  A  is  a  [parameter  of  the  cost  function  that  controls  the  cost  of  alerting  relative  to  the 
cost  of  conflict.  This  cost  function  amounts  to  assigning  unit  cost  to  conflict  and  a  cost  of  A  to  an 
alert.  The  value  of  A  de{Peiids  on  the  {preference  of  the  evaluators  of  the  system.  It  may  be  difficult 
for  human  designers  to  think  in  terms  of  relative  costs,  so  instead  of  choosing  A  directly,  it  may  be 
desirable  to  choose  the  A  that  translates  to  the  desired  safety  threshold  as  discussed  in  Section  5. 
The  linear  form  of  Equation  1  is  often  used  for  multiple-attribute  decision  problems  [11],  but  the 
cost  function  need  not  be  linear  in  general.  Winder  and  Kuchar,  for  example,  assign  the  same  cost 
to  any  encounter  that  involves  a  collision,  regardless  of  whether  an  alert  was  issued  [12]. 

One  way  to  conqpare  two  different  collision  avoidance  logics  is  to  compare  their  expected  cost 
over  the  s[pace  of  encounter  scenarios.  The  expected  cost  is  given  by 

^PrMc(u;),  (2) 

where  Pr(a;)  is  the  probability  of  encounter  u;  when  using  the  system.  Prior  TCAS  studies 
involved  developing  encounter  models  from  radar  data  to  represent  this  {Probability  mass  func¬ 
tion  [l-3,7,9, 13-lGj,  although  they  focused  primarily  on  computing  {probability  of  conflict  metrics. 
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A  performance  metric  based  on  expected  cost  is  certainly  a  sensible  way  to  compare  logics,  but 
alternative  metrics  exist  [17]. 

Before  discussing  how  to  find  a  logic  that  provides  the  lowest  expected  cost,  it  should  be 
emphasized  that,  prior  to  this  work,  a  multi-objective  metric  has  not  been  directly  used  to  evaluate 
TCAS.  Various  metrics  related  to  successful  alerts  and  unnecessary  alerts  were  treated  indepen¬ 
dently  [18].  Even  when  using  a  multi-objective  metric  to  compare  logic,  there  is  still  value  in 
analyzing  performance  relative  to  individual  objectives  (Section  5). 


2.2  STATE  UNCERTAINTY 

Determining  whether  to  alert  and  which  RA  to  issue  requires  knowledge  of  the  state  of  the  world. 
Several  attributes  of  the  state  are  important  in  collision  avoidance,  such  as  own  and  intruder  posi¬ 
tions  and  velocities.  Because  TCAS  can  only  make  imperfect  measurements  due  to  the  limitations 
of  its  sensors,  it  is  never  completely  certain  about  the  actual  state  of  the  world.  This  uncertainty 
can  be  modeled  as  a  probability  distribution  over  the  space  of  possible  states.  In  this  report,  b{s) 
indicates  the  probability  assigned  to  state  .s.  In  the  literature,  the  mapping  b  is  called  a  belief 
state  [19]. 

The  current  version  of  TCAS  uses  an  altimeter  to  measure  own  altitude  and  beacon  surveil¬ 
lance  to  measure  the  range,  altitude,  and  bearing  of  intruders.  This  report  refers  to  the  aggregate 
of  all  of  these  sensor  measurements  at  a  particular  time  as  an  observation.  Updating  the  belief  state 
based  on  an  observation  requires  knowledge  of  sensor  performance — certainly,  it  would  be  difficult 
to  make  good  decisions  if  there  was  no  knowledge  of  how  sensor  measurements  relate  to  the  state 
of  the  world.  In  particular,  it  is  useful  to  know  as  well  as  possible  the  probability  distribution  over 
observations  given  that  the  current  state  is  s,  which  may  he  written  Pr(*  |  s).  The  model  used  by 
the  system  to  represent  this  probability  is  called  the  observation  model  (or,  sometimes,  sensor  or 
noise  model).  It  is  also  useful  to  have  a  dynamic  model  (or,  sometimes,  state-transition  or  process 
model),  which  represents  Pr(*  |  .s,a),  a  distribution  over  future  states  given  that  action  a  was  taken 
from  state  s.  The  dynamic  model  represents  how  the  state  evolves  in  response  to  the  actions  taken 
by  the  collision  avoidance  system. 

Given  the  current  belief  state  6,  action  a,  and  observation  o,  it  follows  from  Bayes’  rule  that 
the  updated  belief  state  6'  is  given  by 

b\s')  =  Pr(s'  I  o,  a,  b) 

(X  Pr(o  1  s\  a,  b)  Pr(y  |  a,  6) 

—  Pr(o  I  .s')^Pr(.9'  I  .s,a)6(.s).  (3) 

s 

The  process  of  updating  the  belief  state  is  known  as  belief  updating,  filtering,  or  state  estimation, 
and  there  is  a  wealth  of  literature  on  the  subject  [20,21].  A  Kalman  filter  is  one  way  to  efficiently 
update  the  belief  state  for  certain  classes  of  dynamic  and  observation  models  [22-24].  The  current 
version  of  TCAS  uses  an  a-j3  tracker  for  altitude  and  an  tracker  for  intruder  range.  These 

trackers  do  not  directly  take  into  account  explicit  dynamic  and  observation  models,  but  they  have 
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been  tuned  to  work  well  for  the  TCAS  operating  environment.  They  also  do  not  explicitly  represent 
a  distribution  over  states;  they  simply  output  a  single  state  estimate  that  is  used  by  the  threat 
logic. 


2.3  DYNAMIC  UNCERTAINTY 

The  l)ehavior  of  the  aircraft  involved  in  an  encounter  and  the  response  of  pilots  to  HAs  is  nonde- 
terministic.  If  the  current  state  is  s  and  action  a  is  taken  by  the  collision  avoidance  system,  the 
probability  distribution  over  the  next  state  (say  one  second  later)  is  given  by  Pr(-  |  s,a),  which  is 
specified  by  the  dynamic  model.  Because  the  current  state  is  not  known  exactly,  as  discussed  in 
Section  2.2,  the  belief  state  b  must  be  used  to  predict  the  state  at  the  next  time  step: 

Pr(s‘'  \h,a)  =  ^ /;(.s)  Pr(s''  |  .s.a).  (4) 

s 

The  dynamic  model  may  be  repeatedly  applied  to  predict  the  future  state  any  number  of  steps  into 
the  future.  Assuming  that  the  state  distribution  is  known  t  steps  into  the  future,  the  following 
formula  may  be  used  to  estimate  the  state  distribution  t  +  1  steps  into  the  future: 

Pr(s,+i  I  bo,  «o - - ««)  =  I  Pr(st  |  bo,  ao, (5) 

where  ao....,at  is  the  secpience  of  actions  taken  by  the  collision  avoidance  system.  Pquatiou  5 
assumes  that  the  dynamics  are  Markovian,  meaning  the  distribution  over  states  at  the  next  time 
step  depends  only  on  the  current  state  and  action  executed.  So  long  as  sufficient  inforination  is 
encoded  in  the  state,  this  assumption  is  reasonable  for  a  large  class  of  dynamic  systems  [25]. 

As  discussed  by  Yang,  it  is  important  that  the  dynamic  model  be  as  accurate  as  possible  [26]. 
An  inaccurate  model  can  increase  the  collision  or  alert  rate  because  it  cannot  adequately  distinguish 
safe  from  hazardous  flight  trajectories.  The  current  version  of  TCAS,  as  well  as  many  alerting 
systems  suggested  in  the  literature  [27-29],  use  deterministic  (noiseless)  dynamic  models.  These 
systems  often  use  straight-liiie  projection  of  aircraft  trajectories  to  predict  whether  a  conflict  will 
occur.  In  order  to  compensate  for  the  assumption  that  aircraft  will  not  deviate  from  their  nominal 
trajectories,  these  systems  artificially  enlarge  their  conflict  boundaries.  Although  such  systems 
can  provide  low  probability  of  conflict,  it  is  at  the  expense  of  a  higher  alert  rate.  Other  systems 
use  worst-case  trajectory  models  that  determine  whether  a  conflict  is  possible  [30-32].  Although 
this  type  of  approach  does  not  require  artificially  enlarging  the  conflict  boundaries,  it  typically 
has  an  unnecessarily  high  false  alert  rate  because  it  does  not  distinguish  threats  according  to 
their  likelihood.  Kuchar  and  Yang  survey  a  variety  of  different  approaches  that  use  probabilistic, 
deterministic,  and  worst-case  state  projection  [33]. 

2.4  OPTIMAL  DECISION  MAKING 

Figure  3  shows  the  relationship  between  the  tracker  (belief  updating  process)  and  the  logic  that 
decides  upon  the  action  to  take  at  the  current  instant  in  time.  The  belief  state  5,  as  computed 
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by  the  tracker  in  the  current  version  of  TCAS,  is  simply  a  single  point  estimate  that  assigns  all 
probability  to  a  single  state.  In  general,  this  need  not  be  the  case,  and,  in  fact,  it  may  be  possible 
to  make  better  decisions  if  the  uncertainty  in  the  underlying  state  is  distributed  more  realistically. 

Although  this  report  discusses  tracking  issues  to  some  extent,  the  primary  focus  of  this  report 
is  on  the  logic.  The  logic  is  a  function  that  takes  as  input  a  belief  state  b  and  outputs  an  action  a 
to  be  executed.  In  the  optimal  control  literature,  this  function  is  often  called  a  policy  or  decision 
rule  [25,34,35].  The  objective  is  to  use  the  dynamic  and  sensor  models  to  find  an  optimal  policy 
TT*  that  provides  the  lowest  expected  cost  according  to  some  metric. 

Assuming  that  the  state  is  known  exactly,  the  expected  cost  when  following  a  policy  (not 
necessarily  the  optimal  one)  satisfies  the  recursion 

,r{s)  =  _c(s,:^(,s))  +  ^Pr(s'  1  s,n{s))J'^{s'),  (6) 

current  cost 

expected  future  cost 


where  c(s,7r(s))  is  the  immediate  cost  associated  with  being  in  state  s  and  executing  the  action 
specified  by  the  policy  tt  for  that  state.  If  Equation  1  is  used  as  the  encounter  cost  function,  the 
immediate  cost  is  one  if  a  conflict  occurred  for  the  first  time,  A  if  the  system  alerted  for  the  first 
time,  and  zero  otherwise.  The  function  is  called  the  cost-to-go  function. 

In  order  to  calculate  it  is  useful  to  define  the  mapping 

=  c{s,  7r(s))  +  Pr(s'  |  s,  tt(s))J(s').  (7) 


If  Jo  is  the  cost-to-go  function  that  assigns  zero  to  all  states,  then  is  the  cost-to-go  function 

after  one  time  step  according  to  policy  tt.  The  cost-to-go  function  when  following  tt  for  k  decisions 
is  given  by  B^Jq.  Repeated  application  of  l^ads  to  convergence  to  [25]. 

The  optimal  cost-to-go  function  obtained  by  following  the  optimal  policy  satisfies  the  recursion 


J*(s)  —  min 


c(s,a)  +  ^Pr(s' 
s' 


s,  a)J*{s') 


(8) 


The  optimal  cost-to-go  function  may  be  computed  in  a  way  similar  to  using  the  mapping  B, 
known  as  the  Bellman  update  operator, 


BJ{s) 


min 

a 


c(s,a)  -H  ^Pr(s' 

s' 


s,a)J(s') 


(9) 


Repeated  application  of  B  to  Jq  leads  to  convergence  to  J*.  This  process  is  called  value  iteration. 
Once  the  optimal  cost-to-go  function  is  computed,  the  optimal  policy  tt*  may  be  extracted  as 
follows: 


7r*(s)  =  argmin 


c(s,  a)  +  Y  Pr(s' 


s,  a)J*{s') 


(10) 
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Collision  Avoidance  System 


Figure  3.  Data  flow  between  the  tracker  and  logic. 


Generalizing  P]qiiation  8  to  handle  belief  states  instead  of  states  is  not  difficult  and  is  shown 
elsewhere  [19].  However,  solving  for  the  optimal  cost-to-go  function,  and  hence  the  policy,  is  very 
difficult  in  general.  A  variety  of  approximation  methods  have  been  proposed  in  the  literature  [36,37] 
and  have  been  applied  to  airborne  collision  avoidance  [38].  These  methods  plan  over  a  sampling  of 
the  space  of  belief  states  and  appear  to  scale  well  to  problems  of  small  to  moderate  size. 

The  approach  explored  in  this  report  avoids  planning  over  the  space  of  belief  states  by  making 
the  following  approximation: 


7r*{b)  =  arg  min  ^  b{s) 

s 


c(,s  a)  +  Y^  Pr(,s'  I  5,  a)J*{.s') 

s' 


(11) 


F]quation  11  is  similar  to  Equation  10,  except  that  the  immediate  cost  and  expected  future  cost 
are  weighted  by  the  distribution  encoded  by  the  belief  state.  This  approximation  is  known  as  the 
QMDP  value  method  [39,40].  It  amounts  to  assuming  that  all  state  uncertainty  will  vanish  at  the 
next  step.  This  approximation  appears  to  work  well  for  many  applications  but  tends  to  fail  in 
problems  where  taking  particular  actions  results  in  a  significant  reduction  in  state  uncertainty.  For 
the  TCAS  sensor— as  well  as  GPS-based  sensors  such  as  ADS-B  the  particular  action  taken  by 
the  collision  avoidance  system  will  have  a  negligible  impact  on  state  uncertainty.  Nevertheless,  it 
would  be  interesting  to  compare  the  performance  of  QMDP  against  policies  found  by  algorithms 
that  plan  over  the  belief  space,  such  as  HSVI  [36]  and  SARSOP  [37]. 


2.5  DISCUSSION 

This  section  has  focused  on  logic  optimization  where  the  performance  metric  is  a  function  of  con¬ 
flict  and  alert  rates,  but  other  objectives  may  also  be  taken  into  consideration.  For  example, 
Wolf  explored  using  flight-plan  deviation  as  a  performance  metric  for  immaimed  aircraft  collision 
avoidance  [41].  This  report  does  not  focus  on  explicitly  minimizing  flight-plan  deviation  for  two 
reasons.  First,  minimizing  false  alerts  will  result  in  fewer  flight-plan  deviations.  Second,  flight-plan 
information  is  not  an  input  into  the  current  version  of  TCAS. 

The  problem  formulation  presented  in  this  section  is  known  as  a  Markov  decision  process 
(MDP).  Such  problems  have  been  well  studied  since  the  work  by  Bellman  in  the  1950s  [42],  and 
several  books  treat  the  subject  in  depth  [34,35,43-45].  When  the  state  is  not  known  exactly,  the 
formulation  is  called  a  partially  observable  Markov  decision  process  (POMDP)  [19].  MDPs  and 
POMDPs  have  been  applied  to  a  variety  of  difTerent  problems,  including  robotic  motion  planning 
[46],  agricultural  management  [47],  medical  diagnosis  [48],  and  spoken  dialog  systems  [49].  There 
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have  also  been  several  airborne  collision  avoidance  applications  for  both  manned  and  unmanned 
aircraft  [12,38,41,50  52]. 

Depending  on  the  problem,  it  may  be  more  natural  to  speak  in  terms  of  reward  and  value 
functions^  which  are  the  opposites  of  cost  and  cost-to-go  functions,  respectively.  Many  of  the 
references  provided  in  this  report  take  the  perspective  of  reward  instead  of  cost.  Although  defining 
the  problem  of  collision  avoidance  in  terms  of  reward  is  entirely  valid,  this  report  uses  cost  to  avoid 
having  to  work  with  negative  rewards. 

This  report  assumes  that  the  dynamic  and  observation  models  are  known.  The  dynamic 
model  can  be  inferred  from  recorded  operational  data  and  the  observation  model  can  be  based 
on  sensor  performance  specifications.  There  is  a  body  of  work,  however,  on  problems  where  the 
dynamic  model  is  unknown  and  performance  is  optimized  through  interaction  with  the  world.  This 
active  area  of  research  is  known  as  reinforcement  learning  [53,54].  Some  reinforcement  learning 
approaches  involve  learning  an  explicit  dynamic  model,  but  others  do  not.  Because  reinforcement 
learning  leads  to  changes  in  the  behavior  of  the  system  over  time  in  response  to  experience  in  the 
world,  this  kind  of  approach  may  not  be  appropriate  for  a  safety-critical  system  that  needs  to  be 
standardized  and  certified.  However,  some  of  the  techniques  can  be  applied  to  finding  approximately 
optimal  policies  offline  that  can  then  be  “frozen”  and  evaluated  just  like  those  found  using  dynamic 
programming. 

The  equations  in  this  section  assume  that  the  state  space  (and  hence  the  encounter  space)  is 
discrete,  allowing  the  summation  over  probability  masses  as  in  Equation  2,  repeated  here: 

^Pr(a;)c(a;). 

U) 

If  the  space  is  continuous,  the  summation  would  need  to  be  changed  to  an  integral  and  the  proba¬ 
bility  mass  would  need  to  be  changed  to  a  density: 

J  p{u;)c{u;)  dcj.  (12) 

This  report  assumes  discrete  state  and  encounter  spaces  to  simplify  notation.  A  continuous  space 
may  be  approximated  arbitrarily  well  by  a  sufficiently  fine  discretization  scheme.  The  next  section 
shows  how  to  discretize  the  state  space  and  apply  dynamic  programming. 
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3.  DYNAMIC  PROGRAMMING  APPROACH 


The  previous  section  descril)ed  how  to  formulate  the  problem  of  collision  avoidaiici^  using 
a  decision-theoretic  framework.  This  section  shows  how  dynamic  programming  can  he  used  to 
find  the  optimal  logic.  There  are  a  variety  of  techniques  for  applying  dynamic  programming  to 
a  prol)lem  with  continuous  variables.  This  report  focuses  on  a  grid-based  method  that  involves 
discretizing  the  state  sp)ace,  as  will  be  discussed  in  Section  3.1.  The  number  of  discrete  states  grows 
exponentially  with  the  dimensionality  of  the  state  space.  Depending  on  the  size  of  the  discrete  state 
space,  it  might  not  be  practical  to  compute  the  optimal  action  for  every  possible  state.  Section  3.2 
shows  how  the  planning  complexity  can  be  reduced  by  eliminating  states  not  reachable  from  the 
current  state.  Section  3.3  shows  how  to  use  branch-and-bouiid  {numing  to  further  eliminate  states 
that  are  determined  to  be  unreachable  under  the  optimal  policy.  Section  3.4  shows  p)lots  of  th(' 
optimal  p)olicy  evaluated  on  different  cross  sections  of  the  state  space.  Section  3.5  provides  further 
discussion  of  some  of  the  concepts  introduced  in  this  section. 


3.1  FITTED  VALUE  ITERATION 

Applying  the  value  iteration  algorithm  (Section  2.4)  is  straightforward  when  the  state  space  is 
finite.  The  initial  cost-to-go  function  Jq  can  l)e  represented  as  an  array  in  memory,  where  each 
element  in  the  array  corresponds  to  the  c  ost-to-go  for  a  particular  state.  The  array  is  updated  with 
each  app^lication  of  the  Bellman  op)erator  until  convergence. 

If  the  state  space  is  continuous,  it  is  no  longer  possible  to  represent  the  cost-to-go  func¬ 
tion  directly  as  a  finite  array.  There  are  many  strategies  for  representing  the  cost-to-go  function, 
including  decision  trees  [55],  neural  networks  [44],  and  self-organizing  maps  [56].  This  re|)ort  fo¬ 
cuses  on  a  method  called  fitted  value  iteration  that  uses  local  averaging  from  a  finite  set  of  states 
S  =  {si, . . .  ^Sn}  selected  from  the  continuous  state  space.  This  method  was  shown  by  Gordon  to 
provide  a  stable  approximation  of  the  optimal  cost-to-go  function  [57,58]. 

A  variety  of  different  sampling  schemes  can  be  used  to  choose  S.  One  common  method  is  to 
define  S  as  the  vertices  of  a  multidimensional  grid  spanning  the  state  space.  Table  3  shows  the  grid 
edges  used  for  the  five-dimensional  hypothetical  collision  avoidance  problem  (Section  1.3).  This 
discretization  scheme  results  in  \S\  =  2.14  million  states. 

The  cost-to-go  function  J  is  represented  as  an  array  of  l^j  elements,  where  the  /th  element 
corresponds  to  J{si).  To  compute  T(x)  at  an  arbitrary  state  x,  a  variety  of  different  interpolation 
schemes  can  be  used.  Appendix  C  discusses  several  interpolation  methods  that  were  evaluated  on 
the  hypothetical  collision  avoidance  problem.  One  method  that  was  found  to  work  particularly  well 
is  multilinear  interpolation,  which  computes  J(x)  by  taking  the  weighted  average  of  the  cost-to-go 
function  at  the  vertices  of  the  grid  cell  (hyper-rectangle)  that  encloses  x.  The  weight  assigned  to  a 
particular  vertex  is  related  to  its  distance  from  x. 

Fitted  value  iteration  begins  by  initializing  all  elements  in  the  array  representing  J  to  zero. 
The  algorithm  applies  the  Bellman  update  operator  B  (Section  2.4)  to  J  at  the  states  in  S.  The 


17 


TABLE  3 


Grid  edges  for  the  hypothetical  collision  avoidance  problem. 


Variable 

Symbol 

Grid  Edges 

Relative  altitude 

h 

1000,-900,.. 

. , 1000 

Time  to  closest  horizontal  approach 

T 

0,1,...,  20 

Own  vertical  rate 

hi 

-2500,  -2250, . 

. . , 2500 

Intruder  vertical  rate 

h-z 

-2500,  -2250, . 

. . , 2500 

RA  state 

Sr  A 

clear  of  conflict. 

climb  in  4  s,  descend  in  4s,  ... 

Bellman  update  operator  generalized  for  a  continuous  state  space  is  as  follows: 


BJ{s)  =  min 


5,  a)  J(x')  dx.' 


(13) 


The  integral  on  the  right-hand  side  is  the  expectation  of  J  over  states  selected  from  p(-  |  .s,a),  which 
is  the  distribution  over  states  at  the  next  time  step  given  that  action  a  is  executed  from  the  current 
state  s.  In  general,  an  analytical  solution  to  this  integral  does  not  exist,  but  it  may  be  approximated 
using  sampling  methods  as  discussed  in  Appendix  D.  Sigma-point  samj)ling,  which  relies  on  a  small 
nuinber  of  deterministically-chosen  samples,  works  particularly  well  on  the  hypothetical  j>roblem 
as  shown  in  Section  5.3  (Figure  21). 


The  update  operator  when  sampling  from  the  next  state  distribution  (Appendix  D)  and 
interpolating  the  cost-to-go  function  (Appendix  C)  reduces  to 


BJ{s)  =  min 


c(s,a) 


(14) 


where  T(s'  |  s.a)  is  a  discrete  transition  probability  function  defined  over  states  in  S.  Repeated 
application  of  this  update  operator  results  in  the  (approximately)  optimal  cost-go-go  function 
J*,  which  may  be  used  to  extract  an  (approximately)  optimal  policy  tt*.  The  quality  of  the 
approximation  depends  on  the  level  of  discretization,  interpolation  method,  and  sampling  scheme. 


To  determine  the  optimal  policy  from  an  arbitrary  state  x,  simply  apply  the  continuous 
state-space  generalization  of  Equation  10: 


7r*(x)  =  arg  min 


X,  a)  J*(x')  dx' 


(15) 


Again,  the  integral  on  the  right-hand  side  can  be  evaluated  using  sigma-point  sampling  or  some 
other  sampling  scheme. 


3.2  REACHABLE  STATE  SPACE 

The  relatively  coarse  discretization  used  for  the  hypothetical  collision  avoidance  problem  (Table  3) 
results  in  |5|  =  2.14  million  states,  which  is  manageable-  value  iteration  can  be  done  within  seconds 
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Reachable  with  any  tt 


Figure  4-  Notional  diagram  showing  reachable  states  under  the  optimal  policy  tt*  and  reachable  states  under 
any  policy  tt. 

using  a  standard  desktop  computer.  However,  increasing  the  dimensionality  of  the  state  space  to 
include  multiple  intruders  or  other  features  such  as  intruder  hearing  can  result  in  exponential 
growth  of  the  number  of  states.  One  approach  to  reduce  the  number  of  discrete  states  used  in 
value  iteration  is  to  only  consider  states  reachable  from  the  current  state.  The  number  of  states 
reachable  from  a  given  state  is  typically  a  small  fraction  of  the  total  state  space. 

The  procedure  for  determining  the  optimal  action  from  the  current  state  x  begins  by  in¬ 
crementally  constructing  a  set  of  discrete  states.  The  set  is  initialized  with  the  discrete  states 
reachable  after  a  single  step  following  any  action.  Then,  the  discrete  states  that  are  reachable  from 
the  discrete  states  already  in  the  set  are  added  to  the  set.  The  process  repeats  recursively  until 
there  are  no  longer  any  new  states  to  be  added  to  the  set.  Once  a  (omplete  set  of  reachable  discrete 
states  is  obtained,  value  iteration  can  be  used  to  find  J*  at  these  states  and  the  optimal  action 
from  X  can  be  determined  using  Equation  15. 

Figure  5  shows  the  distribution  over  the  number  of  reachable  states  from  10,000  randomly- 
sampled  initial  states.  The  mean  number  of  reachable  states  is  approximately  838,606  states,  wdiich 
is  39%  of  the  entire  state  space.  For  more  complex  problems,  the  savings  from  only  planning  over 
the  reachable  states  can  be  even  more  significant. 


3.3  BRANCH  AND  BOUND 

The  number  of  reachable  states  under  any  policy  is  generally  small  compared  to  the  full  state  space, 
but  the  number  of  reachable  states  under  the  optimal  policy  can  be  even  smaller  as  illustrated  in 
Figure  4.  Although  finding  the  set  of  states  reachable  under  the  optimal  policy  requires  know'ing 
the  optimal  policy,  a  branch-and-bound  method  can  come  close  to  restricting  the  planning  effort  to 
states  reachable  under  the  optimal  policy.  The  branch-and-bound  method  computes  the  oi^tiinal 
policy  with  a  A*-step  lookahead.  If  the  time  to  horizontal  closest  approach  is  20  seconds  in  the 
simple  two-dimensional  problem,  then  k  should  be  20. 

The  branch-and-bound  algorithm  involves  computing 

J*{s,a)  =  c{s,a)  +  ^r(,s'  |  s,a)J*{s'),  (16) 

s' 
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Figure  5.  Effect  of  branch- arid- bound  pruning. 


which  is  the  cost-to-go  from  state  s  when  executing  a  from  the  current  state  and  then  following  the 
optimal  policy  from  that  point  on.  In  order  to  prune  suboptirnal  actions,  the  algorithm  requires 
knowledge  of  a  lower  bound  on  The  lower  bound  does  not  necessarily  have  to  be  tight, 

although  tighter  bounds  can  lead  to  more  pruning.  In  the  hypothetical  collision  avoidance  problem, 
a  lower  bound  on  J*(s,a)  is  J*(5,a)  =  C{s)  +  XA{a),  where  C{s)  =  1  if  s  is  in  conflict  (zero 
otherwise)  and  A{a)  =  1  if  a  is  an  alert  (zero  otherwise). 

Given  some  state  5,  the  branch-and-bound  algorithm  recursively  computes  7r*(s)  and  J*(s), 
where  tt*  is  the  optimal  policy  and  J*  is  the  optimal  cost-to-go  function.  If  the  lookahead  is  zero, 
then  7r*(s)  =  arg  miua  c(.s,  a)  and  J*{s)  =  miiia  c{s^  a).  If  the  lookahead  is  k  and  the  available 
actions  are  ui, . . . ,  (sorted  in  ascending  Jf{s.^ni))^  it  first  computes  J*(.s,  ai)  using  Equation  16. 
The  on  the  right-hand  side  of  Equation  16  is  recursively  computed  using  the  branch-and- 

bound  algorithm  with  a  lookahead  of  A:  —  I.  If  J*{s,ai)  <  J*(<5,U2)?  then  the  algorithm  will  prune 
a2, . . .  from  consideration  since  it  knows  that  choosing  any  of  those  actions  will  not  lower  the 
expected  cOwSt.  If,  on  the  other  hand,  J*{s.ai)  >  Jf{s^a2)^  then  it  is  worth  computing  J*(s,a2) 
recursively.  The  process  continues  until  the  remaining  actions  have  been  explored  or  pruned  from 
consideration.  The  optimal  action  7t*{s)  is  the  action  that  provides  the  lowest  value  for  J*(s,a). 

Figure  5  shows  the  reduction  in  the  number  of  reachable  states  due  to  branch-and-bound 
pruning.  The  mean  number  of  reachable  states  was  reduced  to  475,279  states,  which  represents 
22%  of  the  state  space.  The  savings  can  be  even  more  significant  when  there  are  more  actions  (i.e., 
more  kinds  of  alerts)  from  which  to  choose  (Section  7.7). 
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3.4  POLICY  PLOTS 


To  obtain  an  approximation  to  the  optimal  policy  of  Equation  15,  the  state  space  was  discretized 
and  fitted  value  iteration  was  performed.  The  integral  of  Equation  15  was  approximated  as  a  sum 
over  the  next  state  distribution,  evaluated  using  sigma-point  sampling  and  multilinear  interpolation. 
The  cost  of  alerting  A  w^as  arbitrarily  chosen  to  be  0.1. 

Figure  6  shows  the  policy  for  different  h-r  cross  sections  of  the  state  space.  The  cross  sections 
are  defined  by  the  tuple  (/q,  /12,  sra)  ^re  supplied  in  the  siibfigure  captions.  Overlaying  the  plots 
are  three  trajectories  starting  from  a  variety  of  states.  The  trajectories  correspond  to  the  projected 
noiseless  motion  of  the  aircraft  while  executing  eac'h  of  the  three  actions.  After  the  five-second 
pilot-response  delay,  the  trajectory  moves  downward  (to  smaller  values  of  h)  while  executing  the 
Climb  action  and  upward  (to  larger  values  of  h)  while  executing  the  Descend  action.  The  noiseless 
motion  of  the  aircraft  without  alerting,  called  the  nominal  trajectory,  follows  a  straight  line.  A 
trajectory  is  colored  red  if  it  terminates  in  conflict. 

The  policies  largely  agree  with  intuition.  For  instance,  when  both  aircraft  are  flying  level, 
as  in  Figure  6(a),  the  policy  is  symmetrical,  indicating  the  best  action  for  the  own  aircraft  is  to 
descend  if  the  intruder  is  close  enough  above  and  to  climb  if  the  intruder  is  close  enough  below, 
dliis  is  because,  although  the  aircraft  are  flying  level,  the  noisiness  in  the  aircraft  vertical  rates  does 
have  the  potential  of  ('ausing  a  (‘onflict  when  the  aircraft  are  near  each  other.  Effective  placement 
of  the  alerting  boundaries,  however,  is  not  intuitive.  Through  the  use  of  dynamic  programming, 
the  placement  of  the  alerting  boundaries  was  optimized  to  minimize  the  expected  cost.  Because  the 
policy  generation  process  does  not  rely  on  random  sampling,  the  alerting  boundaries  are  smooth 
and  well  defined. 

The  other  plots,  which  represent  the  policy  for  slices  in  which  the  aircraft  are  either  climbing 
or  descending,  are  not  symmetrical.  In  Figure  6(b),  for  example,  because  the  own  aircraft  is  flying 
at  lOOOft/min,  issuing  a  descend  RA  has  a  greater  effect  on  the  own  vertical  rate  than  issuing  a 
climb  RA,  as  the  simulated  trajectories  serve  to  show.  Therefore,  in  cases  when  an  alert  is  necessary, 
the  policy  favors  descending.  How^ever,  in  some  cases,  it  is  still  necessary  to  climb.  For  example,  in 
some  regions  where  the  intruder  is  close  enough  above  the  own  aircraft,  the  policy  suggests  climbing. 
This  is  similar  to  an  altitude-crossing  RA.  As  expected,  the  policy  rarely  advises  a  maneuver  when 
the  own  aircraft  is  above  the  intruder. 

A  striking  feature  of  the  policies  is  the  conspicuous  absence  of  any  alert  when  there  is  little 
time  until  horizontal  closest  approach  and  the  aircraft  are  closely  separated  in  altitude.  Intuition 
would  seem  to  suggest  that  this  is  a  crucial  time  at  which  an  alerting  system  should  alert  to  reduce 
the  threat  of  conflict.  However,  because  the  hypothetical  collision  avoidance  problem  assumes 
that  the  pilot  takes  five  seconds  to  respond  to  the  RA,  the  aircraft  are  very  likely  to  come  into 
conflict  with  or  without  an  RA.  In  this  case,  there  is  no  incentive  to  alert  because  the  expected  cost 
without  alerting  is  actually  lower  than  the  expected  cost  with  alerting.  As  the  pilot-response  delay 
is  decreased,  the  alert  boundary  moves  to  the  left.  If  the  pilot-response  delay  i.s  made  probabilistic 
with  some  probability  of  immediate  response,  the  alerting  region  may  extend  fully  to  the  left 
(Section  7.2). 
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(a)  Oft /mill,  Oft/min.  no  RA 


(b)  lOOOft/min,  Oft/min,  no  RA 


(c)  —  lOOOft/min,  500  ft /min,  no  RA 


(d)  1000  ft /min,  1000  ft /min,  no  RA 


□ 

1  Descend  | 

□ 

Climb 

Figure  6.  Collision  avoidance  policies  generated  using  dynamic  programming.  The  caption  of  each  subfigure 
indicates  the  cross  section  {hi^  h2,  s for  which  the  policy  is  evaluated.  The  horizontal  axis  represents  r 
in  seconds y  and  the  vertical  axis  represents  h  in  feet. 
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In  each  of  the  plots,  although  the  nominal  trajectory  does  not  result  in  conflict,  the  policy 
recommends  executing  an  action.  Due  to  the  noise  in  the  system,  the  relative  p^ath  of  the  intruder 
will  deviate,  however  slightly,  from  the  j)rojected  straight-line  path.  The  introduction  of  this  noise 
causes  the  exp^ected  probability  of  conflict  in  the  future  to  be  non-zero.  The  expected  cost  while 
following  the  RA  trajectory,  in  turn,  is  less  than  the  expected  cost  while  following  the  nominal 
trajectory. 


3.5  DISCUSSION 

This  section  discussed  how  to  compute  the  op)timal  policy  ojfline  (Section  3.1)  and  how  to  comp)ute 
the  optimal  action  from  the  current  state  online  (Sections  3.2  and  3.3).  Offline  solution  methods 
involve  computing  a  rep^resentation  of  the  op)timal  p)olicy  from  all  p^ossible  states.  The  p^olicy 
representation  is  then  used  online  with  minimal  computational  effort.  Online  solution  methods 
involve  computing  the  best  action  from  the  current  state  during  execution.  In  general,  online 
solution  methods  require  much  more  computational  effort  while  making  decisions  and  can  only 
plan  for  contingencies  within  a  finite  time  horizon. 

There  are  many  other  online  solution  methods  besides  the  ones  discussed  here  [59],  including 
methods  like  LAO*  that  use  heuristics  to  reduce  the  planning  sp)ace  [60].  It  is  important  to  note 
that  online  solution  methods  can  be  used  offline  to  determine  the  op)timal  action  from  a  saiiip)ling  of 
states.  The  op^timal  policy  can  be  apip:)roxi mated  by  a  classifier  trained  on  the  sampled  states  [61,62]. 
The  classifier  can  be  compactly  rep:)resented,  for  example,  as  a  sup)port  vector  machine,  neural 
network,  decision  tree,  or  human-readable  logic.  For  TCAS,  an  online  solution  method  can  be  used 
to  aid  human  designers  in  tuning  the  p)arameters  of  existing  logic  to  match  the  op)tiinal  policy  at  a 
selection  of  states. 

Many  offline  methods,  such  as  fitted  value  iteration,  involve  modeling  the  cost-to-go  function 
over  the  entire  state  space  using  local  ap)p3roximation.  This  section  discussed  one  way  to  model  the 
cost-to-go  function  by  interp)olating  values  at  the  vertices  of  a  multidimensional  grid  spjaiming  the 
state  space.  The  experiments  discussed  in  this  rep)ort  use  grids  with  regularly-spaced  edges,  hut 
for  some  p^roblems  it  may  be  desirable  to  increase  the  resolution  in  some  regions  of  the  state  sp)ace 
where  it  is  needed  to  better  ap)proximate  the  cost-to-go  function,  while  leaving  the  resolution  coarse 
in  other  regions  to  reduce  memory  and  comp^utational  requirements.  Muiios  and  Moore  show  how 
to  dynamically  infer  a  suitable  multiresolution  representation  [63]. 

An  alternative  to  local  ap)p)roximation  of  the  cost-to-go  function  is  global  app)roximatiou  using 
a  parametric  reporesentation,  such  as  a  neural  network.  Such  an  app:)i*oach  has  been  studied  exten¬ 
sively  in  the  reinforcement  learning  community  and  has  been  successfully  applied  to  a  variety  of 
problems  [44].  If  development  of  the  grid-based  local  approximation  method  p)ursued  in  this  rep^ort 
is  found  to  have  difficulty  scaling  to  higher  dimensions,  a  global  p^arametric  rep)resentation  of  the 
cost-to-go  function  may  be  a  way  forward. 

Approximate  dynamic  p)rogramming  is  an  active  area  of  research  and  there  have  been  many 
important  advances  in  recent  years  allowing  them  to  scale  to  increasingly  more  complex  p^roblenis 
[64].  This  report  explores  only  a  handful  of  methods  that  were  relatively  simple  to  implement  and 
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appeared  to  have  the  most  promise.  Increasing  the  complexity  of  the  model  to  include  multiple 
intruders,  for  example,  may  require  adopting  or  extending  other  solution  methods. 
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4.  CONFLICT  PROBABILITY  ESTIMATION  APPROACH 


Although  dynamic  programming  is  a  powerful  optimization  method,  it  can  be  difficult  to 
scale  to  prol)lems  with  many  state  varial)les.  An  alternative  method  that  may  scale  more  gracefully 
involves  estimating  the  probability  of  conflict  from  the  current  state  when  issuing  different  alerts. 
This  differs  from  TCAS,  which  uses  temporal  and  spatial  criteria  to  issue  alerts  (Section  1.2). 
A  conflict  probability  estimation  approach  has  been  applied  to  a  [)rototype  alerting  system  for 
free  flight  [65]  and  an  alerting  logic  for  closely  spaced  parallel  approaches  [66].  In  the  late  1980s, 
researchers  at  Lincoln  Laboratory  investigated  a  conflict  probability  estimation  approach  for  TCAS 
Ill  [67], 

The  ability  to  accurately  estimate  the  probability  of  conflict  upon  which  these  alerting  systems 
are  predicated  is  crucial  if  they  are  to  properly  detect  conflicts.  For  example,  a  system  that 
overestimates  the  conflict  probability  is  likely  to  have  a  high  false  alarm  rate,  while  one  that 
underestimates  the  conflict  probability  is  likely  to  have  a  high  rate  of  missed  or  late  detection  [26]. 
Much  research,  therefore,  has  focused  on  conflict  probability  estimation  based  on  analytic  [68,69], 
numerical  approximation  [70,71],  and  Monte  Carlo  methods  [72]. 

Conflict  probability  estimates  leased  on  analytic  or  numerical  methods  require  strong  as¬ 
sumptions  about  the  form  of  the  dynamics,  typically  that  the  dynamics  are  adequately  described 
by  linear-Gaussian  e(|iiations.  Monte  Carlo  simulation  allows  much  greater  flexibility  in  modeling, 
although  the  computational  demands  can  become  prohibitively  high.  Estimating  the  probability 
of  a  rare  event  such  as  aircraft  conflict  with  a  high  level  of  confidence  requires  a  large  number  of 
samples.  A  variance-reduction  teclmiciue  known  as  importance  sampling  can  deliver  an  estimator 
that  uses  significantly  fewer  samples  than  direct  sampling  while  providing  the  same  accuracy.  Dy¬ 
namic  programming  (an  also  l)e  used  to  estimate  conflict  probability,  but  such  an  approach  requires 
discretization  of  the  state  space. 

This  section  discusses  and  compares  the  performance  of  conflict  probability  estimation  meth¬ 
ods  based  on  analytic  approximation,  dynamic  programming,  and  Monte  Carlo  sampling.  There 
are  conceivably  many  different  alerting  schemes  that  can  be  coiistructed  based  on  the  probability 
of  conflk  t.  This  section  discusses  three  of  them  and  presents  plots  of  the  resulting  policies  for  slices 
through  the  state  space. 


4.1  ANALYTICAL  APPROXIMATION 

The  dynamics  of  the  aircraft  in  the  hypothetical  collision  avoidance  problem  can  be  approximated 
by  a  linear-Gaussian  system  (Appendix  B)  that  has  two  discrete  modes:  no-RA  execution  mode 
and  RA  execution  mode.  By  definition,  a  switch  to  RA  execution  mode  occurs  when  own  aircraft, 
at  any  time,  begins  to  apply  a  0.25-g  acceleration  to  reach  the  RA  target  vertical  rate.  During  RA 
execution,  the  motion  of  own  aircraft  becomes  deterministic.  The  motion  of  the  intruder  remains 
probabilistic. 

Starting  from  an  arbitrary  initial  state,  the  distribution  representing  the  state  uncertainty 
can  be  propagated  forward  in  time  using  the  e(|uations  in  Appendix  B  until  r  =  0.  The  probability 
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Figure  1.  Evolution  of  the  Gaussian  distribution  of  h  until  CPA. 

of  conflict  can  he  estimated  by  integrating  the  Gaussian  distribution  over  /i  at  r  =  0  from  —100  ft 
to  +100  ft.  Figure  7  illustrates  the  evolution  of  the  distribution  of  h  until  the  closest  point  of 
approach,  where  the  area  under  the  curve  is  the  estimate  of  the  probability  of  conflict. 

Another  method  of  propagating  the  Gaussian  distribution  of  the  aircraft  state  relies  on  sigma- 
point  sampling  (Appendix  D)  in  which,  at  each  time  step,  the  distribution  is  approximated  by 
deterministically  chosen  sample  points  that  are  propagated  through  the  true  nonlinear  system 
dynamics.  The  resulting  points  approximate  the  distribution  at  the  next  time  step. 


4.2  DYNAMIC  PROGRAMMING 

Fitted  value  iteration  (Section  3)  can  be  used  to  estimate  the  probability  of  conflict  for  not  alerting, 
issuing  a  Climb  RA,  and  issuing  a  Descend  RA.  The  state  space  can  be  discretized  using  the  same 
scheme  used  earlier  (Table  3).  The  cost  function  c(.s)  is  one  if  s  is  a  conflict  state  (r  =  0  and 
i/ii  <  100  ft)  and  zero  otherwise.  The  Bellman  update  operator  for  computing  the  probability  of 
conflict  for  action  a  at  a  discrete  state  s  is 

BJ{s,a)  =  c{s)  +  j p{x'  I  .9,a)J(x')c/x',  (17) 

which,  as  suggested  in  Section  3,  can  be  approximated  using  sigma-point  sampling  and  mulitilinear 
interpolation.  The  repeated  application  of  the  Bellman  update  operator  leads  to  an  estimate  of  the 
cost-to-go  function  that  is  equivalent  to  the  probability  of  conflict  from  s  following  a. 

4.3  MONTE  CARLO 

An  alternative  method  for  estimating  the  probability  of  conflict  relies  on  Monte  Carlo  sampling. 
For  example,  a  number  of  trajectories  can  be  sampled  from  the  probabilistic  dynamic  model  (Ap- 
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Figure  8,  Comparison  of  state  propagation  using  direct  sampling  versus  importance  sampling. 

pendix  A).  The  probability  of  conflict  is  the  fraction  of  trajectories  that  result  in  conflict.  This 
is  known  as  the  direct  Monte  Carlo  method,  or  direct  sampling.  The  accuracy  of  the  estimate 
increases  with  the  number  of  sam[)le  trajectories. 

However,  because  the  event  of  a  conflict  is  tyiucally  rare,  performing  direct  Monte  Carlo 
sampling  in  this  fashion  is  inefficient.  Sampling  the  trajectories  from  a  different  distribution,  called 
the  proposal  distribution,  such  that  sample  trajectories  result  in  conflict  with  high  probability  would 
lead  to  increased  efficiency  in  estimating  the  probal:)ility  of  conflict.  This  is  known  as  importance 
sampling.  To  ol)tain  an  unbiased  estimate  of  the  probability  of  conflict,  the  sample  trajectories 
must  be  weighted  according  to  the  likelihood  that  they  would  have  been  produced  by  the  original 
distribution.  Figure  8  highlights  the  difference  between  direct  sampling  and  importance  sampling. 
The  fact  that  few  trajectories  terndnate  in  conflict  using  the  direct  sampling  method  makes  it 
a  poor  estimator  of  conflict  probability  [73].  Nearly  all  the  trajectories  produced  by  importance 
sampling,  however,  result  in  conflict  and  therefore  contribute  to  the  conflict  probability  estimate. 

The  following  sections  outline  several  proposal  distributions  that  were  explored.  Appendix  E 
provides  a  more  detailed,  mathematical  discussion  of  the  proposal  distributions. 

4.3.1  Constant  Acceleration  Proposal 

The  aircraft  in  the  model  experience  random  vertical  accelerations  in  the  form  of  zero-mean 
Gaussian  noise.  The  history  of  vertical  accelerations,  called  the  control  trajectory,  uniquely  specifies 
a  state  trajectory.  Typically,  these  accelerations  do  not  produce  trajectories  that  result  in  conflict, 
except  when  the  start  state  is  already  near  conflict.  However,  the  zero-mean  Gaussian  distribution 
from  which  the  accelerations  are  sampled  can  be  adjusted  to  artificially  induce  conflict  trajectories. 
At  each  time  step  along  the  state  trajectory,  one  may  calculate  the  accelerations  that,  if  applied 
constantly  until  CPA,  would  force  the  aircraft  into  conflict  but  that  disturb  the  flight  path  of  the 
aircraft  as  little  as  possible.  These  constant  accelerations  can  serve  as  the  mean  of  the  Gaussian 
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distribution  from  which  to  sample  the  vertical  accelerations  at  each  time  step.  If  this  process 
continues  until  CPA,  it  is  likely  that  the  resulting  state  trajectory  will  result  in  conflict. 

4.3.2  Maximum-Likelihood  Acceleration  Proposal 

A  good  proposal  distribution  should  produce  samples  that  result  in  conflict  while  still  resem¬ 
bling,  as  much  as  possible,  the  distribution  of  the  model  [73].  Because  the  accelerations  in  the 
model  are  zero-mean,  the  mean  of  the  proposal  distribution  should  be  as  close  to  zero  as  possible 
while  still  producing  trajectories  that  terminate  in  conflict.  Finding  the  sequence  of  accelerations 
can  be  framed  as  an  optimization  problem  where  the  objective  is  to  minimize  the  square  norm  of 
the  accelerations  subject  to  the  constraint  that  a  conflict  occurs  at  r  =  0.  After  solving  for  the 
acceleration  sequence,  the  first  set  of  accelerations  is  used  as  the  mean  of  the  proposal  distribution 
and  the  remaining  accelerations  are  discarded.  (In  this  way,  it  is  similar  to  model  predictive  control 
used  in  online  trajectory  planning  [74].)  As  before,  this  process  continues  for  the  remainder  of  the 
encounter. 

4.3.3  Analytic  Proposal 

It  can  be  shown  that  the  optimal  proposal  distribution  from  which  to  sample  the  accelerations 
at  a  particular  time  is  proportional  to  the  product  of  the  probability  of  conflict  at  the  next  time 
step  with  the  distribution  of  the  accelerations  according  to  the  model.  Of  course,  it  is  not  possible 
to  actually  calculate  the  optimal  proposal  distribution  because  it  requires  that  the  probability  of 
conflict  be  known.  However,  by  using  an  estimate  of  the  prol)ability  of  conflict  in  place  of  its  true 
value,  the  optimal  proposal  distribution  can  be  approximated.  An  effective  proposal  distribution, 
though  still  suboptimal,  is  a  Gaussian  distribution  whose  mean  is  approximately  equal  to  the  mean 
of  the  optimal  pro[)Osal  distribution,  which  can  be  obtained  by  using  the  probability  of  conflict 
estimate  afforded  by  the  analytic  approximation  described  earlier. 

4.3.4  Dynamic  Programming  Proposal 

The  dynamic  programming  proposal  distribution  is  identical  to  that  of  the  analytic  proposal 
distribution  except  that  the  dynamic  programming  estimate  of  the  probability  of  conflict  is  used 
to  determine  the  mean  of  the  distribution. 


4.4  ESTIMATION  COMPARISON 

This  section  presents  results  demonstrating  the  performance  of  the  various  estimation  methods 
described  above.  The  following  abbreviations  refer  to  the  estimation  methods:  Analytic,  DP, 
Direct,  IS-Constant,  IS-ML,  IS-Analytic,  and  IS-DP.  The  performance  of  an  estimator  is  related 
to  how  quickly  the  estimator  converges  to  a  stationary  estimate.  The  standard  error  (SE)  of  the 
estimator  is  a/y/N^  where  a  is  the  sample  standard  deviation  and  N  is  the  number  of  samples. 

Figures  9  and  10  show  convergence  plots  for  Pr(C  |  x,a)  and  SE[Pr(C  |  x,a)],  estimates  of 
the  probability  of  conflict  and  its  standard  error,  from  three  different  states  while  executing  the  no 
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alert  and  descend  actions,  respectively.  The  plots  on  the  left  show  Py{C  \  while  the  plots  on 

the  right  show  SE[Pr(C  |  x,  a)].  The  tuples  on  the  left  panel  indicate  the  states  from  which  the 
probability  of  conflict  is  estimated.  All  plots  show  convergence  for  up  to  1000  sample  trajectories 
except  for  the  last  two  plots  of  both  figures  for  which  more  trajectories  are  needed  because  the 
probability  of  conflict  is  low.  In  all  cases,  importance  sampling  converges  fastest,  and  the  analytic 
proposal  distribution  consistently  provides  the  lowest  standard  error.  When  the  probability  of 
conflict  is  low,  on  the  order  of  10~^  for  the  last  two  plots,  estimating  the  probability  using  direct 
sampling  is  inefficient,  requiring  many  more  samples  than  the  importance  sampling  estimators  to 
converge  to  a  reasonable  estimate.  Observe  that  in  some  of  the  plots,  certain  estimates  do  not 
appear  because  they  are  far  removed  from  the  other  estimates.  Moreover,  the  standard  error  for 
the  Analytic  and  DP  estimates  is  zero,  but  these  are  not  included  in  the  standard  error  plots. 

Note  particularly  that  for  the  no  alert  action,  the  analytic  a|)proximation  is  quite  accurate  as 
all  estimators  tend  to  approach  its  estimate  as  the  number  of  samj)le  trajectories  increases.  The 
analytic  approximation  for  the  other  actions,  however,  performs  poorly  due  to  its  linear-Gaussian 
approximation.  However,  using  importance  sampling  with  a  i)roj)osal  distribution  derived  from 
the  analytic  approximation  converges  to  values  comparable  to  those  of  the  other  estimators  with 
low  standard  error.  This  result  suggests  that  rough  approximations  of  conflic't  probability  can 
still  be  exploited  to  generate  proposal  distril)utions  that  achieve  lower  variance  than  their  direct 
counterpart. 

The  accuracy  of  the  dynamic  programming  estimates  can  he  cjiiite  poor  compared  to  the 
other  methods  because  of  the  coarseness  of  the  discretization.  Several  experiments  have  shown 
that  increasing  the  granularity  of  the  discretization  improves  the  accuracy  of  its  conflict  probability 
estimates  at  the  expense  of  requiring  many  more  states. 


4.5  POLICY  GENERATION 

Several  different  strategies  have  been  suggested  for  using  conflict  f)robability  estimates  to  decide 
when  to  alert.  This  report  focuses  on  the  following  three  strategies  for  constructing  policies: 


1.  Conservative  policy:  Carpenter  and  Kucliar  [66],  in  the  development  of  logic  for  closely 
spaced  parallel  approach,  suggested  alerting  when  the  probability  of  conflict  without  alerting 
exceeds  a  set  threshold.  The  alert  that  minimizes  the  probability  of  conflict  is  chosen.  If  the 
probability  of  conflict  is  minimal  without  alerting,  no  alert  is  issued.  This  policy  is  called 
conservative  because  it  alerts  no  later  (and  typically  much  earlier)  than  the  other  policies. 
(Figure  11(a)) 

2.  Delay  policy:  In  the  late  1980s,  Lincoln  Laboratory  researchers  developing  TCAS  III  pro¬ 
posed  waiting  to  alert  until  the  conflict  probabilities  for  all  actions  reach  or  exceed  the  thresh¬ 
old  [67].  This  strategy  is  referred  to  as  the  delay  policy  because  it  tries  to  delay  alerting  as 
much  as  possible.  (Figure  11(b)) 
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Oft,  20s,  1000 ft/min,  Oft/min,  no  RA  500ft,  20s,  lOOOft/min,  Oft/min,  no  RA  Oft,  20s,  Oft/min,  Oft/min,  no  RA 


Probability  of  conflict 


Standard  error 


-  Analytic  -  DP  -  Direct  IS-Constant  -  IS-ML  —  IS-Analytic  IS-DP 


Figure  9.  Convergence  plots  from  several  different  states  for  the  no  alert  action. 
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Oft,  20s,  lOOOft/min,  Oft/min,  no  RA  500ft,  20s,  lOOOft/min,  Oft/min,  no  RA  Oft.  20s.  Oft/min,  Oft/min,  no  RA 


Probability  of  conflict 


Standard  error 


—  Analytic  —  DP  —  Direct  IS- Constant  —  IS-ML  —  IS- Analytic  IS-DP 


Figure  10.  Convergence  plots  from  different  states  for  the  descend  action. 
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3.  Conservative  delay  policy:  This  strategy  waits  until  there  is  a  unique  alert  that  has  a 
probability  of  conflict  below  the  threshold.  It  typically  alerts  after  the  conservative  policy 
but  before  the  delay  policy.  (Figure  11(c)) 

The  longer  alerts  are  delayed,  the  less  likely  the  alert  will  be  unnecessary  but  the  more  likely 
it  will  come  too  late  to  prevent  conflict.  Figure  12  illustrates  the  difference  in  alert  timing  for  the 
three  strategies  when  the  alert  threshold  is  set  to  0.1.  The  conservative  strategy  alerts  as  soon  as 
the  conflict  probability  when  not  alerting  reaches  the  threshold.  It  issues  a  climb  advisory  because 
it  provides  the  lowest  probability  of  c()nflict  at  that  point  in  time.  The  conservative  delay  strategy 
waits  until  there  is  a  unique  advisory  (in  this  case,  climb)  that  provides  a  conflict  probability 
less  than  the  threshold.  The  delay  strategy  waits  until  the  moment  all  alerts  meet  or  exceed  the 
threshold  before  it  issues  a  climb  advisory. 

Yang  and  Kuchar  developed  a  miiltistaged  threshold  alerting  system  for  free  flight  that  does 
not  fit  into  one  of  the  three  categories  above  [65].  In  their  prototype  system,  low- probability  threats 
produced  passive  alerts  (e.g.,  changing  the  color  of  a  traffic  symbol)  while  high- probability  threats 
produced  aural  zone  transgression  messages  to  indicate  to  the  pilot  that  an  evasive  maneuver  should 
be  performed  to  resolve  the  conflict.  There  are  four  alert  stages  in  total.  During  the  first  three 
stages,  alerts  are  issued  to  aid  the  pilot  in  resolving  the  conflict  before  tactical  maneuvering  is 
required.  During  the  fourth  stage,  air  traffic  control  intervenes  by  issuing  commands  (e.g.,  heading, 
vertical  rate,  or  speed  changes)  intende^d  to  increase  the  amount  of  separation.  The  alerting  logic 
delays  the  issuance  of  an  alert  if  a  sufficient  number  of  maneuvers  is  still  available  to  the  pilot  in 
order  to  resolve  a  conflict.  They  defined  a  maneuver  as  available  if  the  probability  of  conflict  would 
be  reduced  to  less  than  0.05  by  performing  the  maneuver. 


4.6  POLICY  PLOTS 

This  section  presents  visual  representations  of  the  policies  of  Section  4.5.  The  policy  plots  are 
similar  to  those  presented  in  Section  3.4.  All  plots  show  the  policy  for  h-r  cross  sections  of  the 
state  space.  Figures  13,  14,  and  15  show  the  policies  generated  by  the  three  alerting  strategies  for 
four  different  cross  sections  defined  by  the  tuple  (/q,  /i2,  ^ra),  supplied  in  the  subfigure  captions. 
The  probability  of  conflict  was  estimated  using  importance  sampling  with  the  maximum-likelihood 
proposal  distribution  (Appendix  E.2)  with  100  sample  trajectories.  The  threshold  on  the  probability 
of  conflict  was  arbitrarily  set  to  0.01. 

The  policies  largely  agree  with  intuition.  For  instance,  when  both  aircraft  are  level,  as  in 
snbfignres  (a),  the  best  action  for  the  own  aircraft  is  to  descend  if  the  intruder  is  close  enough 
above  and  to  climb  if  the  intruder  is  close  enough  below.  This  is  because,  although  the  aircraft  are 
flying  level,  the  noisiness  in  the  aircraft  vertical  rates  does  have  the  potential  of  causing  a  conflict 
when  the  aircraft  are  near  each  other. 

When  the  own  aircraft  is  climbing  at  lOOOft/min  and  the  intruder  is  level,  as  in  subfigures  (b), 
the  best  action  is  to  descend  when  the  intruder  is  above  even  when  the  vertical  separation  is  large. 
When  the  intruder  is  above  and  very  close  in  altitude,  i.e.,  less  than  100  ft,  the  policy  recommends 
climbing.  This  is  similar  to  an  altitude-crossing  RA  in  the  current  TCAS  logic  (Section  1.2). 
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Is  the  probability  of  conflict 
without  alerting  greater  than  A? 

yes 

11  r\ 

IIU 

Issue  the  alert  that  minimizes 
the  probability  of  conflict 

[Do  no 

t  alert) 

(a)  Conservative  policy. 


(b)  Delay  policy. 


(c)  Conservative  delay  policy. 

Figure  1 1 .  Decision  trees  for  alerting  policies. 
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Figure  12.  A  notional  diagram  illustrating  the  difference  between  the  three  alerting  strategies.  In  this 
diagram,  the  alerting  threshold  is  set  to  0.1.  The  probability  of  conflict  for  the  various  actions  is  plotted. 
The  times  at  which  the  different  alerting  strategies  issue  a  climb  advisory  are  indicated. 


This  might  appear  counter-intuitive.  However,  this  is  justified  because  the  own  aircraft  is  already 
climbing.  If  the  descend  maneuver  were  issued,  much  of  the  RA  execution  would  consist  of  reversing 
direction,  thus  resulting  in  a  higher  probability  of  conflict.  An  analogous  argument  holds  for 
subfigures  (c)  and  (d). 

Other  observations  about  the  i)olicies  are  also  worthy  of  mention: 


•  When  T  is  small,  there  is  usually  no  incentive  in  alerting  because  the  five-second  pilot  delay 
makes  any  evasive  maneuvers  useless. 


•  The  policy  of  Figure  14  rarely  alerts  due  to  the  fact  that,  as  Figure  11(b)  indicates,  there  is 
almost  always  one  action  for  which  the  probability  of  conflict  falls  below  the  threshold. 


•  The  conservative  policy  and  conservative  delay  policy  are  very  similar  except  that  the  latter 
alerts  less  when  r  is  large  and  the  vertical  separation  is  relatively  small.  This  is  because  in 
such  rirnimstances  there  are  multiple  actions,  i.e.,  both  the  climb  and  descend  maneuvers, 
for  which  the  probability  of  conflict  falls  below  the  threshold. 


•  The  conservative  policy  is  likely  to  have  a  high  false  alarm  rate  and  low  conflict  rate  compared 
to  the  delay  policy.  The  conservative  delay  policy  is  likely  to  provide  a  balance  between  the 
two. 
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(a)  Oft/min,  Oft/iniii,  no  R  A 
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(b)  1000 ft/niiii,  Oft/min,  no  HA 


□ 

1  Descend  j 

□ 

Climb 

Figure  13.  Collision  avoidance  policy  generated  by  the  conservative  strategy.  The  caption  of  each  subfigure 
indicates  the  cross  section  {hiJi2.SRA)  for  which  the  policy  is  evaluated.  The  horizontal  axis  represents  r 
in  seconds,  and  the  vertical  axis  represents  h  in  feet. 
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□ 

1  Descend 

□ 

Climb 

Figure  14^  Collision  avoidance  policy  generated  by  the  delay  strategy.  The  caption  of  each  subfigure  indicates 
the  cross  section  (/ii,  h2.S{iA)  for  which  the  policy  is  evaluated.  The  horizontal  axis  represents  r  in  seconds, 
arid  the  vertical  axis  I'epresents  h  in  feet. 
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(a)  Oft/niin,  Oft/iiiiii,  no  RA  (b)  lOOOft/inin,  Oft/inin,  no  HA 


□ 

1  Descend 

□ 

Climb 

Figure  15.  Collision  avoidance  policy  generated  by  the  conservative  delay  strategy.  The  caption  of  each 
subfigure  indicates  the  cross  section  {hi,  h2,  s ^a)  for  which  the  policy  is  evaluated.  The  horizontal  axis 
represents  r  in  seconds,  and  the  vertical  axis  represents  h  in  feet. 
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4.7  DISCUSSION 


This  section  compared  analytic,  dynamic  programming,  and  Monte  Carlo  methods  for  estimating 
the  probability  of  conflict  in  the  hypothetical  collision  avoidance  problem.  Although  the  encounter 
model  is  relatively  simple,  the  analytic  method  had  difficulty  providing  accurate  conflict  probability 
estimates  because  the  dynamics  are  not  exactly  linear-Gaussian,  Approximating  the  behavior  of 
more  complex  encounter  scenarios  with  linear-Gaiissian  dynamics  and  analytically  solving  for  the 
conflict  probability  will  likely  result  in  poor  estimates.  The  accuracy  of  the  estimates  provided  by 
dynamic  programming  was  limited  due  to  the  coarseness  of  the  state  space  discretization.  Although 
the  dynamic  programming  estimates  could  be  improved  by  increasing  the  level  of  discretization, 
this  can  require  an  infeasible  number  of  states,  especially  if  the  dimensionality  of  the  state  space  is 
large. 

Of  the  methods  discussed  in  this  section,  conflict  probability  estimation  using  Monte  Carlo 
sampling  seems  to  be  the  most  promising.  The  ciuality  of  Monte  Carlo  estimates  can  be  significantly 
improved  if  a  suitable  proposal  distribution  is  used  and  the  samples  are  approi)riately  weighted. 
Several  different  proposal  distributions  were  considered,  including  proposal  distributions  that  lever¬ 
age  information  from  analytic  approximations  and  dynamic  programming  solutions.  Although  the 
analytic  or  dynamic  programming  estimates  may  in  themselves  be  inaccurate,  it  was  shown  that  the 
heuristic  information  they  provide  can  be  used  to  improve  the  efficiency  of  Monte  Carlo  estimates 
through  importance  sampling. 

Incorporating  higher-fidelity  encounter  dynamics  in  three  spatial  dimensions  would  likely  de¬ 
generate  the  accuracy  of  analytic  methods  even  further.  Moreover,  discretizing  the  state  space 
sufficiently  fine  for  accurate  dynamic  programming  estimates  may  be  challenging  due  to  the  higher 
dimensionality  of  the  models.  In  three-dimensional  space,  conflicts  are  much  rarer  because  air¬ 
craft  can  miss  each  other  laterally  in  addition  to  vertically.  Consequently,  the  advantage  of  using 
importance  sampling  over  direct  sampling  will  be  even  more  significant. 

The  performance  of  the  policies  of  Section  4.5  is  dependent  on  a  number  of  factors.  Using  a 
more  accurate  method  for  calculating  the  probability  of  conflict  or  increasing  the  number  of  sample 
trajectories  generally  enhances  performance.  In  the  generation  of  the  policies  of  Figures  13,  14, 
and  15,  the  threshold  was  arbitrarily  chosen.  Effective  threshold  placement  can  be  aided  through 
tlie  use  of  system  operating  characteristic  curves  to  be  discussed  in  Section  5. 

Although  policies  based  on  conflict  probability  estimates  may  perform  well,  they  will  not  be 
optimal  in  general.  A  simple  exaini)le  of  this  is  shown  in  Figure  16.  The  current  state  is  represented 
by  the  root  node.  Vrom  this  state,  the  system  may  either  alert  or  not  alert  (for  simplicity  in  this 
example,  climb  and  descend  are  not  distinguished).  If  an  alert  is  issued,  then  the  probability  of 
conflict  is  some  7  between  zero  and  one.  If  an  alert  is  not  issued  from  the  current  state,  the  system 
is  given  another  opportunity  to  alert  at  the  next  state.  From  this  next  state,  if  an  alert  is  issued, 
a  conflict  is  guaranteed  not  to  occur,  but  if  an  alert  is  not  issued  a  conflict  is  guaranteed  to  occur. 
Dynamic  programming,  in  this  situation,  will  advise  waiting  to  alert  until  the  next  state  and  will 
prevent  conflict.  Any  alerting  strategy  based  solely  on  current  conflict  probability  estimates  will 
alert  because  the  conflict  probability  from  the  current  state  is  one. 
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no  alert, 


alert 


r1  0  <  7  <  1 

no  alert/  \alert 

1  0 

Figure  16.  An  example  of  a  situation  where  dynamic  programming  dominates  a  conflict  probability  approach 
regardless  of  cost  function  or  alerting  threshold. 

The  example  in  Figure  16  was  constructed  to  serve  ai^  a  very  simple  situation  where  a  dynamic 
programming  policy  dominates  a  threshold  policy,  regardless  of  cost  function  or  alerting  threshohh 
In  realistic  encounters,  dynamic  programming  will  provide  better  policies  than  a  threshold  approach, 
assuming  that  the  discretization  used  for  dynamic  programming  is  sufficiently  fine.  In  practice,  it 
may  he  difficult  to  provide  the  level  of  discretization  needed  by  dynamic  programming  to  provide  a 
close  to  optimal  policy.  Further  experimentation  will  reveal  how  well  threshold  alerting  strategies 
approximate  the  optimal  policy  in  practice. 
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5.  EVALUATION 


Due  to  the  safety-critical  nature  of  collision  avoidance  systems,  extensive  research  over  the 
years  has  focused  on  developing  a  methodology  for  evaluating  such  systems.  This  section  enumer¬ 
ates  a  set  of  mutually  exclusive  and  collectively  exhaustive  alerting  system  outcomes  and  discusses 
performance  metrics  used  in  prior  safety  assessments  of  TCAS  and  other  alerting  systems.  It 
describes  how  system  operating  characteristic  curves  can  provide  a  visual  representation  of  perfor¬ 
mance  tradeoffs  when  varying  design  parameters.  Finally,  this  section  presents  the  results  from  a 
preliminary  evaluation  of  the  dynamic  programming  (DP)  logic  using  a  safety  assessment  tool  used 
for  evaluating  previous  versions  of  the  TCAS  logic. 


5.1  ALERTING  SYSTEM  OUTCOMES 

The  outcomes  of  an  alerting  system  scenario  can  be  divided  into  six  different  c  ategories  depending 
on  three  criteria:  (1)  whether  an  alert  was  necessary,  (2)  whether  an  alert  was  issued,  and  (3) 
whether  a  ('onflict  occurred.  Before  emimerating  these  categories,  it  is  important  to  first  define 
what  is  meant  by  a  necessary  alert.  Determining  whether  an  alert  is  necessary  in  a  particular 
situation  is  not  exactly  straightforward  because  the  dynamics  are  nondeterniinistic.  The  standard 
approach  is  to  run  simulations  both  with  and  without  the  alerting  system  with  the  same  random 
seed.  Bc^cause  the  seeds  are  the  same,  the  trajectories  with  and  without  the  system  will  be  c^xactly 
the  same  until  after  an  alert  is  issued.  Once  the  pilot  responds  to  the  alert,  the  trajectory  with 
t  he  alerting  system  will  diverge  from  the  trajectory  without  the  alerting  system,  called  the  nominal 
trajectory.  An  alert  is  defined  to  be  necessary  if  the  nominal  trajectory  results  in  conflict. 

Table  4  enumerates  the  possible  alerting  system  outcomes  as  described  by  Winder  and  Kuchar 
[75].  Figure  17  illustrates  the  outcomes  using  example  trajectories.  Note  that,  according  to  the 
definition  of  necessary  alert,  the  following  two  outcomes  are  not  possible:  (1)  an  alert  is  necessary, 
an  alert  is  not  issued,  and  a  (onflict  does  not  occur;  and  (2)  an  alert  is  not  necessary,  an  alert  is 
not  issued,  and  a  conflict  occurs. 


TABLE  4 

Outcome  categories. 


Outcome  category 

Abbreviation 

Alert  Necessary? 

Alert  Issued? 

Conflict  Occurred? 

Correct  Rejection 

CR 

Correct  Detection 

CD 

/ 

/ 

False  Alarm 

FA 

/ 

Missed  Detection 

MD 

/ 

/ 

Induced  Conflict 

IC 

/ 

/ 

Late  Alert 

LA 

/ 

/ 

/ 
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Figure  17.  Alerting  outcome  categories.  Solid  lines  indicate  trajectories  with  the  system.  Dotted  lines  indicate 
trajectories  without  the  system.  Red  lines  indicate  that  a  conflict  occurred  with  the  system,  (solid)  or  that  a 
conflict  would  have  occumd  without  the  system  (dotted). 


5.2  METRICS 


The  performance  of  alerting  systems  is  usually  assessed  using  a  few  relevant  quantifiable  perfor¬ 
mance  metrics.  Many  of  these  metrics  can  be  calculated  from  counts  of  the  six  outcome  categories 
of  Section  5.1  generated  in  simulation  over  a  wide  range  of  possible  aircraft  encounters.  Here,  CR, 
CD,  FA,  MD,  IC,  and  LA  denote  the  counts  of  the  alerting  system  outcomes. 


1.  Probability  of  conflict:  The  probability  that  a  conflict  occurs  with  the  alerting  system  can 
be  estimated  from  the  outcome  counts,  assuming  each  encounter  is  equally  likely,  as  follows: 


Pr(C) 


MD  +  IC  +  LA 

CR  +  CD  +  FA  +  MD  +  IC  +  LA ' 


(18) 


If  the  encounters  are  not  equally  likely,  as  will  be  the  case  in  Section  5.4,  the  relative  likelihood 
assigned  to  each  encounter  must  be  taken  into  account. 


2.  Probability  of  alert:  The  probability  that  the  system  alerts  can  be  a[)proximated  by  the 
frequency 

rd\]-  CD  +  FA  +  IC  +  LA 

^  CR-bCD  +  FA  +  MD-f-IC  +  LA'  ^  ^ 


Together  with  the  previous  metric,  an  alerting  system  is  deemed  effective  if  Pr(A)  is  large 
enough  to  achieve  a  suitably  small  Pr(C),  but  not  any  larger. 


3.  Probability  of  unnecessary  alert:  An  alert  is  considered  to  be  unnecessary  if  it  is  not 
necessary  to  prevent  conflict  (i.e.,  the  nominal  trajectory  does  not  result  in  conflict).  The 
probability  of  unnecessary  alert,  Pr(UA),  is  the  probability  that,  when  an  alert  is  issued,  it 
is  unnecessary.  Pi(UA)  can  be  approximated  by 


Pr(UA)  = 


FA  +  IC 

FA  +  IC  +  CD  +  LA  +  MD’ 


(20) 
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4.  Probability  of  successful  alert:  An  alert  is  considered  to  be  successful  if  it  is  issued 
and  a  conflict  does  not  occur.  The  probability  of  successful  alert,  Pr(SA),  is  the  probability 
that,  when  an  alert  is  issued,  it  is  successful.  An  effective  alerting  system  has  a  low  rate  of 
unnecessary  alert  and  a  high  rate  of  successful  alert.  Note  that,  according  to  the  preceding 
definitions  of  unnecessary  and  successful  alert,  an  alert  can  be  siinnltaneously  classified  as 
unnecessary  and  successful.  Pr(SA)  can  be  approximated  by 

CD  +  FA 

FA  +  IC  +  CD  +  LA  +  MD’ 


5.  Risk  ratio:  The  risk  ratio  is  a  measure  of  the  change  in  the  probability  of  conflict  due  to 
the  equipage  of  an  alerting  system.  It  is  a  concise  metric  for  assessing  the  safety  benefit  of 
equipping  an  alerting  system.  It  is  defined  as  the  ratio  between  the  probability  that  a  conflic  t 
will  occur  with  the  alerting  system  and  the  probability  that  a  conflict  will  occur  without  the 
alerting  system: 


Pr(C  I  alerting  system)  MD  -h  IC  -h  LA 
Pr(C  I  no  alerting  system)  CD  -h  MD  -h  LA ' 


(22) 


where  Pr(C  |  alerting  system)  is  the  probability  of  conflict  with  the  alerting  system  and 
Pr(C  I  no  alerting  system)  is  the  probability  of  conflict  without  the  alerting  system.  A  risk 
ratio  of  zero  indicates  that  the  system  resolves  all  conflicts,  while  a  risk  ratio  of  one  indic*at(\s 
that  the  system  provides  no  additional  benefit  in  reciiicing  the  number  of  conflicts.  Risk  ratios 
above  one  indicate  poorly-designed  alerting  systems  that  increase  the  probability  of  conflict. 
Risk  ratio  has  been  used  in  prior  TCAS  safety  studies  [1  3, 14,76  78]. 

6.  Unresolved  risk  ratio  component:  This  is  the  component  of  the  risk  ratio  that  is  due 
to  unresolved  conflict  risk.  A  conflict  is  unresolved  if  it  occurs  both  with  and  wit  hout  the 
alerting  system.  It  is  given  by 


RR 


unresolved 


MD  -h  LA 
CD  -h  MD  -h  LA  ‘ 


(23) 


This  is  equivalent  to  the  conditional  probability  of  an  unresolved  conflict  given  that  an  alert 
is  necessary. 


7.  Induced  risk  ratio  component:  This  is  the  component  of  the  risk  ratio  that  is  due  to 
induced  conflict  risk.  A  conflict  is  induced  if  it  occurs  with  the  alerting  system  but  not 
without  the  alerting  system.  It  is  given  by 


PP  induced  — 


IC 


CD  +  MD  +  LA 


(24) 


8.  Vertical  miss  distance  (VMD):  This  report  defines  VMD  as  the  (nonnegative)  vertical 
separation  between  the  aircraft  at  the  point  in  the  encounter  when  the  horizontal  miss  distance 
is  minimal.  Good  alerting  systems  should  increase  VMD  without  excessively  disturbing  the 
nominal  flight  path. 
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Figure  18.  Example  SOC  curve. 

With  additional  information  about  encounter  rates,  several  other  metrics  may  be  derived  such 
as  mean  time  between  conflict,  which  may  be  more  intuitive  as  a  safety  indicator.  The  metrics  above 
depend  on  the  rates  of  specific  discrete  events  (e.g.,  whether  or  not  a  conflict  or  an  alert  occurred). 
However,  other  metrics  can  be  derived  that  summarize  various  continuous  physical  properties  of 
the  simulations.  For  example,  it  may  be  insightful  to  study  the  effect  of  different  systems  on  the 
average  flight-plan  deviation  [41]  and  the  average  vertical  speed  and  acceleration  [38].  Once  the  logic 
supports  RA  changes  and  VSLs,  a  collection  of  other  metrics  can  be  defined  (e.g.,  the  rate  of  RA 
reversals)  to  predict  operational  acceptability.  Prior  TCAS  studies  have  used  ^Triplet”  outcome 
metrics  to  compare  performance  across  no  alerting  system  and  two  other  alerting  systems  [79]. 
These  metrics  will  be  discussed  in  detail  in  a  future  safety  study  after  further  development  of  the 
DP  logic. 


5.3  SYSTEM  OPERATING  CHARACTERISTIC  CURVES 

The  performance  of  an  alerting  system  depends  on  many  different  parameters,  siU'Ji  as  dynamic 
model  parameters,  sensor  model  parameters,  and  alerting  thresholds.  The  evaluation  of  alert ing- 
system  performance,  therefore,  can  be  quite  complex.  Kuchar  [18,80]  developed  a  unified  methodol¬ 
ogy  for  the  evaluation  of  alerting-system  performance  in  which  performance  tradeoffs  are  analyzed 
through  the  use  of  system  operating  characteristic  (SOC)  curves.  SOC  curves,  analogs  of  receiver 
operating  characteristic  (ROC)  curves  in  signal  detection  theory  [81],  were  originally  introduced  as 
plots  of  Pr(CD)  versus  Pr(FA).  This  report  considers  a  modified  version  in  which  Pr(SA)  is  plotted 
against  Pr(UA).  A  notional  example  of  an  SOC  curve  is  shown  in  Figure  18. 

Each  point  on  the  SOC  curve  is  called  an  operating  point.  The  shape  of  the  curve  is  traced  out 
as  the  alerting  threshold,  or  any  system  parameter,  is  adjusted.  Each  alerting  threshold  maps  to  an 
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Figure  19.  Simple  simulation  framework. 


operating  point.  The  ideal  operating  point  is  at  the  top-left  corner  of  the  plot,  where  F^r(UA)  is  zero 
and  Pr(SA)  is  one.  Due  to  uncertainty  in  the  state  and  measurements,  the  ideal  operating  point 
is  usually  not  obtainable.  The  dashed  line  represents  the  SOC  curve  for  a  system  that  provides 
no  additional  benefit;  an  alert  is  equally  likely  to  be  successful  as  it  is  unnecessary.  Curves  to 
the  left  of  this  line  indicate  systems  that  increase  the  probability  of  successful  alert  for  the  same 
level  of  unnecessary  alerts.  The  closer  the  operating  points  are  to  the  upper-left  corner,  the  better 
the  system  performs.  Through  the  use  of  an  SOC  curve,  the  effect  of  a  system  parameter  change 
directly  maps  to  a  difference  in  system  performance,  thus  highlighting  the  performance  tradeoffs 
that  must  be  brought  to  bear  in  the  placement  of  effective  system  parameters. 

Variants  of  the  SOC  curve  are  possible,  so  long  as  the  horizontal  axis  represents  the  undesirable 
outcome  (to  be  minimized)  and  the  vertical  axis  the  desirable  one  (to  be  maximized).  One  variant 
of  the  SOC  curve  involves  plotting  1  —  Pr(C)  against  Pr(A).  The  advantage  of  this  SOC  curve  is 
that  the  changes  in  the  overall  level  of  safety,  1  —  Pr(C),  which  is  sometimes  more  tangible,  c  an  l)e 
analyzed  directly. 

This  section  discusses  SOC  curves  generated  in  a  simulation  framework  that  uses  the  hypo¬ 
thetical  collision  avoidance  dynamics  of  Appendix  A.  Figure  19  outlines  the  simulation  framework 
used  to  compute  the  performance  metrics  of  Section  5.2.  The  next  section  discusses  evaluation 
with  an  aircraft  dynamic  model  that  is  significantly  more  realistic  and  has  been  used  in  prior 
TCAS  safety  studies. 

Figure  20(a)  presents  the  SOC  curves  of  Pr(SA)  versus  Pr(UA)  produced  by  varying  the 
cost  of  alerting  A  from  zero  to  one  in  discrete  steps.  Each  operating  point  was  obtained  from 
running  10,000  encounters  in  simulation.  The  upper-right  regions  of  the  curves  correspond  to  costs 
of  alerting  near  zero  and  the  lower-left  regions  correspond  to  costs  near  one.  Each  of  the  three  SOC 
curves  was  produced  by  varying  the  level  of  uncertainty  in  the  state  as  eiu'oded  in  the  amount  of 
noise  in  the  aircraft  vertical  rates.  The  amount  of  noise  is  controlled  by  the  standard  deviation  of 
the  vertical  acceleration  of  the  aircraft,  h.  Figure  20(b)  is  a  plot  of  1  —  Pr(C)  versus  Pr(A),  again 
for  the  three  levels  of  system  noise. 

To  gauge  the  performance  of  the  DP  logic  against  the  existing  TCAS  logic,  a  simplified  ver¬ 
sion  of  the  TCAS  logic,  called  mini  TCAS,  was  implemented.  Appendix  I  outlines  the  principal 
assumptions  of  mini  TCAS.  The  real  TCAS  algorithm  periodically  receives  noisy  measurements  of 
the  range,  bearing,  and  altitude  of  nearby  intruders  and  initializes  and  maintains  intruder  tracks 
based  upon  these  surveillance  data.  Additionally,  TCAS  knows  information  regarding  the  own 
aircraft  state  with  some  level  of  uncertainty.  Mini  TCAS,  on  the  other  hand,  receives  perfect  infor¬ 
mation  about  the  own  aircraft  and  intruder  states  and  issues  alerts  based  solely  on  this  information. 
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Figure  20.  SOC  curves  for  the  DP  logic  at  different  noise  levels.  Operating  points  for  the  TCAS  logic  at 
different  sensitivity  levels  are  shown  as  dots. 


This  aniounts  to  perfect  tracking  of  own  and  intruder  aircraft.  Figure  20  shows  several  operating 
points  of  the  mini  TCAS  system  (represented  as  colored  dots).  Each  operating  point  corresponds 
to  a  different  altitude  for  own  aircraft.  Each  altitude  maps  directly  to  a  particular  value  for  the 
altitude  layer  and  sensitivity  level  parameters  of  the  TCAS  logic.  All  the  TCAS  thresholds  vary 
as  functions  of  either  the  altitude  layer  or  the  sensitivity  level.  Therefore,  adjusting  the  altitude  of 
own  aircraft  causes  mini  TCAS  to  alert  differently.  Table  5  is  a  list  of  the  own  altitude,  sensitivity 
level,  and  altitude  layer  for  each  of  the  TCAS  operating  points  of  Figure  20. 

If  the  cost  of  alerting  is  high,  Pr(SA)  and  Pr(UA)  are  small  because  it  is  not  advantageous 
to  alert.  When  the  cost  of  alerting  is  decreased,  alerts  are  issued  more  frequently,  increasing 
both  Pr(SA)  and  Pr(UA).  In  general,  the  system  performance  degrades  as  the  system  uncertainty 


TABLE  5 

Own  altitude,  sensitivity  level,  and  altitude  layer  of  TCAS  operating  points. 


Own  Altitude  (ft) 

Sensitivity  Level 

Altitude  Layer 

1000 

3 

1 

3000 

4 

2 

7000 

5 

3 

15,000 

6 

4 

30,000 

7 

5 

43,000 

7 

6 
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increases.  The  alerts  produced  by  the  TCAS  logic  are  mostly  successful  but,  as  is  often  the  case, 
mostly  unnecessary.  This  is  understandable  because  the  objective  of  TCAS  is  to  provide  a  minimum 
level  of  vertical  separation  (ALIM)  that  is  always  greater  than  the  vertical  separation  that  defines 
a  conflict.  In  other  words,  TCAS  alerts  conservatively. 

If,  from  an  operational  standpoint,  the  requirements  on  the  system  dictate  that  the  rate  of 
unnecessary  alert  (or  false  alarm)  cannot  exceed  a  certain  level,  SOC  curves  can  be  used  to  find  the 
cost  of  alerting  for  which  the  probability  of  successful  alert  (or  correct  detection)  is  the  highest. 

Similar  trends  can  be  observed  in  Figure  20(b).  When  the  cost  of  alerting  is  low,  the  overall 
level  of  safety  is  as  high  as  99%  but  at  the  expense  of  over-alerting.  In  general,  any  alerting  system 
that  issues  alerts  at  a  high  rate  is  capable  of  ensuring  a  high  level  of  safety.  Even  when  the  cost 
of  alerting  is  high,  causing  Pr(A)  to  be  effectively  zero,  approximately  80%  of  encounters  do  not 
result  in  conflict.  This  is  equivalent  to  the  percentage  of  conflict  encounters  without  the  alerting 
system. 

Appendix  F  proves  that  an  optimal  policy  generated  by  dynamic  programming  satisfies  the 
following  properties: 


1.  There  is  no  other  policy  with  the  same  alert  rate  and  a  lower  conflict  rate. 

2.  There  is  no  other  policy  with  the  same  conflict  rate  and  a  lower  alert  rate. 

It  follows  from  this  result  that  optimal  policies  generated  by  dyiiaiiiic  j)rogramming  will  trace  out 
the  best  possible  SOC  curve,  assuming  that  the  discretization  is  sufficiently  fine. 

Figure  20  shows  SOC  curves  for  the  logic  obtained  by  approximating  the  integral  of  Equa¬ 
tion  15  using  sigma-point  sampling  and  multilinear  interpolation.  This  is  just  one  of  several  methods 
for  estimating  the  expected  cost-to-go  at  the  next  time  step  from  each  state.  For  example,  instead 
of  drawing  a  fixed  number  of  deterministic  samples,  as  the  sigma-point  sampling  method  does, 
direct  Monte  Carlo  samples  from  p(x'  |  x,  a)  can  be  drawn.  Figure  21  shows  the  effect  that  using 
direct  Monte  Carlo  sampling  has  on  the  overall  system  performance.  Unsurprisingly,  performance 
improves  as  more  samples  are  used  to  estimate  the  expected  cost-to-go.  The  use  of  the  sigma-point 
sampling  method,  which  uses  only  five  deterministically  chosen  samples  that  capture  the  mean  and 
covariance  of  p,  results  in  the  best  performance  while  relieving  much  of  the  computational  burden 
involved  in  sampling  the  next  states.  In  a  similar  manner,  the  effect  of  using  different  interpolation 
methods  and  changing  the  resolution  of  the  discretization  can  also  be  expjlored,  although  it  is  not 
done  so  in  this  report. 

The  next  set  of  SOC  curves  evaluates  the  conflict  probability  estimation  approach  for  different 
threshold  settings.  Figure  22  shows  the  SOC  curves  for  the  three  policies  of  Section  4:  conservative, 
delay,  and  conservative  delay.  As  before,  10,000  encounters  were  used  to  calculate  each  operating 
point.  The  curves  were  traced  out  as  the  threshold  on  the  probability  of  conflict  was  varied. 
The  action  that  was  executed  at  each  time  step  was  determined  by  calculating  the  probability  of 
conflict  online  and  applying  the  collision  avoidance  logic  of  Section  4.  The  probability  of  conflict 
was  estimated  using  100  trajectories  drawn  from  the  maximum-likelihood  acceleration  proposal 
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(b) 

Sigma-point  sampling 
Monte  Carlo  with  1  sample 
Monte  Carlo  with  5  samples 
Monte  (^arlo  with  100  samples 


Figure  21.  Effect  of  different  sampling  methods  on  overall  system  perfoTinance. 

distribution.  Alternatively,  one  could  reduce  computation  time  during  the  evaluation  process,  at 
the  expense  of  accuracy,  through  the  use  of  a  lookup  table  generated  offline. 

The  range  over  which  the  threshold  was  varied  to  produce  SOC  curves  that  span  the  metric 
space  well  depends  on  the  policy.  In  the  case  of  the  conservative  policy,  the  SOC  curves  were 
plotted  by  varying  the  probability  of  conflict  threshold  from  zero  to  one  in  uniform  increments. 
When  the  threshold  is  close  to  one,  the  system  rarely  alerts.  When  the  threshold  is  near  zero,  the 
system  almost  always  alerts.  Presumably,  if  the  threshold  is  one,  the  probability  of  alerting,  Ih'(A), 
should  be  zero.  The  curves  of  the  conservative  policy  never  extend  to  Pr(A)  —  0,  however.  This 
is  because  the  estimate  of  the  probability  of  conflict  using  importance  sampling  is  not  confined  to 
the  interval  [0,1],  cis  a  true  probability  should.  Therefore  there  is  some  non-zero  probability  that 
the  probability  of  conflict  without  alerting  is  greater  than  one  and  the  system  alerts.  In  the  limit 
as  the  number  of  sample  trajectories  approaches  infinity,  however,  the  estimate  should  lie  within 
the  interval  [0,1].  The  estimate  can  be  forced  to  lie  within  the  interval  [0,1]  by  normalizing  the 
weights.  However,  this  introduces  a  bias  that  goes  to  zero  as  the  number  of  samples  approaches 
infinity  [82]. 

The  SOC  curves  for  the  delay  policy  and  the  conservative  delay  j)olicy  were  obtained  by 
varying  the  threshold  from  10“^^  to  one  and  from  10  to  one  in  a  logarithmic  scale,  respectively. 
The  operating  point  corresponding  to  a  threshold  of  zero  was  also  computed.  A  logarithmic  scale 
was  necessary  to  capture  the  variability  of  the  curve  in  the  regions  where  Pr(SA)  and  Pr(UA)  are 
high. 
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The  results  suggest  that  all  three  policies  perform  similarly.  The  conservative  policy  out¬ 
performs  the  other  two,  as  shown  by  its  SOC  curves  that  almost  always  lie  above  and  to  the  left 
of  the  curves  formed  by  the  other  policies.  In  other  words,  the  conservative  policy  nearly  always 
achieves  a  higher  rate  of  successful  alert  given  some  rate  of  unnecessary  alert  and  a  lower  rate  of 
conflict  given  some  alert  rate.  Moreover,  the  conservative  policy  tends  to  issue  fewer  alerts  while 
giiaraiiteeiiig  the  same  level  of  safety.  In  the  limit  as  the  alert  rate  goes  to  one,  the  rate  of  successful 
alert  and  the  rate  that  no  conflict  occurs  for  all  policies  tend  to  unity. 

Appendix  F  implies  that  the  curve  of  1  — Pr(C)  versus  Pr(A)  for  the  optimal  policy  never  lies  to 
the  right  of  or  below  that  of  any  other  policy.  However,  because  the  policy  calculated  in  this  ref)ort 
is  only  approximately  optimal,  due  primarily  to  the  discretization  process,  the  results  do  indeed 
indicate  that  there  are  some  regions  in  which  the  other  policies  outperform  the  (approximately) 
optimal  policy.  For  example,  when  a-f^  =  1  ft/s^,  the  lowest  conflict  rate  that  the  optimal  policy 
achieves  is  0,006  while  the  other  policies  achieve  a  minimum  conflict  rate  of  zero,  given  a  sufficient 
alert  rate. 


5.4  PRELIMINARY  SAFETY  EVALUATION 

This  section  presents  the  preliminary  safety  evaluation  of  the  dynamic  f)rogramming  logic  using  the 
Collision  Avoidance  System  Safety  Assessment  Tool  (CASSATT),  developed  at  Lincoln  Lal)oratory. 
CASSATT  [)erforins  fast-time  Monte  Carlo  analysis  of  close  encounters  between  two  or  more  aircraft 
over  a  period  on  the  order  of  one  minute  near  the  closest  point  of  approach.  CASSATT  has 
been  used  for  prior  TCAS  safety  analysis  [76]  and  sense-and-avoid  development  for  unmanned 
aircraft  [77]. 

The  CASSATT  framework  was  built  in  Matlab  and  Simiilink  and  has  been  compiled  into 
native  code  using  Real-Time  Workshop.  The  framework  was  designed  to  be  modular  to  allow 
different  collision  avoidance  systems  and  sensor  models  to  be  easily  incorporated.  CASSATT  was 
extended  to  allow  communication  with  an  arbitrary  collision  avoidance  system  over  a  TCP/IP  socket 
connection  [38].  The  collision  avoidance  system  runs  as  a  server  to  which  CASSATT  connects  as  a 
client. 

Figure  23  provides  an  overview  of  the  simulation  framework.  An  encounter  model  generates 
the  initial  conditions  and  scripted  maneuvers  for  both  aircraft  involved  in  the  encounter.  For 
the  purposes  of  this  study,  the  encounter  model  developed  by  Lincoln  Laboratory  for  cooperative 
aircraft  [7]  was  used.  This  encounter  model  was  developed  from  nine  months  of  national  radar  data 
from  over  120  sensors  maintained  by  the  Federal  Aviation  Administration  and  the  Department  of 
Defense.  A  dynamic  Bayesian  network  representing  the  behavior  of  the  aircraft  was  learned  from 
actual  encounters  extracted  from  the  data.  Generating  new  encounters  for  use  in  Monte  Carlo 
analysis  involves  sampling  from  the  dynamic  Bayesian  network. 

The  initial  conditions  define  the  starting  positions  of  the  aircraft.  The  scripted  maneuvers 
control  aircraft  velocities  and  accelerations  at  each  time  step.  The  dynamic  model  takes  as  input 
the  state  and  scripted  maneuvers  at  the  previous  time  step  and  returns  the  state  of  the  aircraft  at 
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Conservative  delay  threshold  Delay  threshold  Conservative  threshold 


Pr(SA)  versus  Pr(UA)  1  -  Pr(C)  versus  Pr(A) 


^  (7f^  =  2ft/s^  =  3ft/s^ 


Figure  22.  SOC  curves  for  threshold  alerting  policies. 
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Figure  23.  CASSATT  simulation  framework. 


the  current  time  step.  The  state  of  one  aircraft  is  represented  by  a  seven-dimensional  vector 

{v,  N,  E,  h,  ^0,  9, 0),  (25) 

where  E,  and  h  are  the  north,  east,  and  altitude  components  of  the  j)osition  in  a  local  flat-earth 
coordinate  system,  0,  0,  and  0  are  the  yaw,  pitch,  and  roll  angles,  and  v  is  the  true  airspeed.  In  a 
two-aircraft  system,  the  state  is 

X  =  {vx,  Ni,  Ei,hi,^x,ei,4>i,V2,  N2,  E2,h2,  (26) 

where  subscript  1  refers  to  the  own  aircraft  and  subscript  2  refers  to  the  intruder.  The  motion 
of  the  aircraft  is  driven  by  controls  in  the  turn  rate,  vertical  rate,  and  airspeed  acceleration,  as 
supi)lied  by  the  encounter  model  or,  optionally,  as  defined  by  the  user.  These  control  values  may 
change  every  tenth  of  a  second.  Aircraft  transient  response  characteristics  and  performance  limits 
such  as  maximum  pitch  rate  or  bank  angle  are  also  included  in  the  dynamic  model. 

The  sensor  model  takes  as  input  the  current  state  from  the  dynamic  model  and  produces 
an  observation,  or  sensor  measiirement.  The  sensor  produces  measurements  of  the  own  aircraft 
altitude,  /^own^  the  intruder  altitude,  the  bearing  to  the  intruder,  the  range  to  the 

intruder,  r.  The  measurement  z  is  given  by 

2  —  (^^own^  ^^int  5  X?  ^ 

where  here  the  tilde  is  used  to  designate  a  measurement.  The  measured  bearing  is  equal  to  the 
true  bearing  plus  additive  zero-mean  Gaussian  noise  with  a  standard  deviation  of  10°.  Similarly, 
the  modeled  range  noise  is  zero-mean  Gaussian  with  a  standard  deviation  of  50  ft.  The  altitude 
measurements  are  quantized  to  25-ft  increments.  The  bias  in  the  altimetry  error  is  distributed 
according  to  a  Laplacian  distribution  whose  parameters  are  dependent  upon  the  altitude  layer. 
All  sensor  error  parameters  match  those  specified  in  the  International  Civil  Aviation  Organization 
(ICAO)  model  [14]. 

Based  on  the  most  recent  observation,  the  state  estimation  process  updates  the  estimates  of 
the  positions,  /lown,  /hnt,  and  r,  of  the  rates,  /?own)  and  r,  and  of  the  range  acceleration,  f.  The 
state  estimate  of  the  tracker  Xtracker  is 

^tracker  ~  (^^own?  ^  ?  /^own?  fUnt-)  ^  ^  ),  (28) 
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where  a  hat  is  used  to  designate  an  estimate.  The  state  of  the  tracker  (Equation  28)  is  not  the 
same  as  the  full  14-dimensional  state  of  the  aircraft  (Equation  26)  as  used  in  the  dynamic  model 
(which  is  not  completely  observable).  An  a-/3  filter  was  used  to  track  altitude  and  altitude  rate,  and 
an  filter  was  used  to  track  intruder  range,  range  rate,  and  range  acceleration.  The  tracker 

is  a  simplified  version  of  the  tracker  implemented  in  the  current  version  of  TCAS  for  intruders 
broadcasting  altitude  with  25-ft  quantization.  Appendix  H  discusses  the  tracker  in  further  detail. 

The  DP  logic  is  evaluated  on  the  updated  state  estimate.  Before  evaluating  the  logic,  the 
output  of  the  tracker,  Xtracker^  niust  be  mapped  to  an  approximate  input  to  logic  evaluation,  xiogic, 
defined  by 

Xiogic  =  {h,  r,  hi,  112,  sra),  (29) 

as  in  Table  3.  If  an  RA  has  not  been  issued  and  the  aircraft  are  estimated  to  be  closing  (i.e,  r  <  0), 
the  mapping  from  Xtracker  fo  xiogj^  is  described  by  the  following  equations: 


/ 

h 

—  ^hnt  ^^own? 

T 

=  |r/r|. 

hi 

~  ^^own  1 

h2 

~  hint  5 

=  clear  of  conflict 

The  optimal  action  to  take  is  7r*(xiogic).  Once  an  RA  has  been  issued,  the  RA  is  maintained  for 
the  remainder  of  the  encounter.  If  the  aircraft  are  diverging  in  range,  no  RA  is  issued.  Receiving 
the  appropriate  action  from  the  logic  evaluation  step,  the  dynamic  model  updates  the  state  x,  and 
the  process  continues  until  the  end  of  the  encounter. 

For  the  preliminary  safety  evaluation  of  the  DP  logic  discussed  in  this  report,  500,000  tra¬ 
jectories  were  generated  using  the  encounter  model.  The  aircraft  trajectories  were  rotated  and 
translated  to  create  the  geometry  at  CPA  as  given  by  the  model.  The  impact  of  using  the  DP  logic 
was  evaluated  in  the  CASSATT  simulation  framework  (Figure  23).  As  a  baseline,  the  performance 
of  TCAS  II  version  7.1  was  also  evaluated  for  comparison.  For  the  purposes  of  this  study,  the 
intruder  was  equipped  with  a  Mode  S  transponder  but  not  equipped  with  a  collision  avoidance 
system.  Both  own  aircraft  and  the  intruder  reported  altitude  in  25-ft  increments.  The  Mode  S 
address  of  the  intruder  was  higher  than  that  of  own  aircraft.  For  this  preliminary  evaluation,  the 
DP  logic  was  calculated  using  a  cost  of  alerting  of  O.I.  Analysis  of  the  performance  of  the  DP  logic 
in  the  CASSATT  framework  when  using  a  different  cost  of  alerting  is  beyond  the  scope  of  this 
initial  study. 

In  the  calculation  of  the  DP  logic,  conflict  between  the  aircraft  was  defined  as  less  than  100  ft 
vertical  separation  at  the  point  when  horizontal  separation  has  been  lost.  It  is  rare,  however,  for 
the  aircraft  to  lose  all  horizontal  separation  in  the  simulation  framework  of  Figure  23.  CASSATT 
defines  a  conflict  more  generally  as  a  near  mid-air  collision  (NMAC),  a  loss  of  separation  less  than 
500  ft  horizontally  and  100  ft  vertically  [4]. 

Instead  of  sampling  the  encounters  directly  from  the  distribution  given  by  the  encounter 
model,  encounters  are  sampled  from  an  alternative  distribution,  called  the  proposal  distribution. 
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to  increa^^e  the  probability  that  an  encounter  results  in  an  NMAC.  This  procedure  is  known  as 
importance  sampling,  as  discussed  in  a  different  context  earlier  in  Section  4.  In  addition  to  reducing 
the  computational  time,  importance  sampling  produces  lower  variance  estimates  of  performance 
metrics.  The  encounters  generated  using  importance  sampling  are  not  equally  likely  to  occur. 
Instead,  a  weight  is  assigned  to  each  encounter  that  indicates  the  likelihood  with  which  it  would 
have  been  sampled  from  the  original  distribution  of  the  model.  The  probabilities  of  Section  5.2  are 
approximated  using  weighted  encounters. 

Tal)le  6  lists  the  probabilities  of  each  of  the  alerting  system  outcomes  of  Section  5.1  for  both 
the  DP  logic  and  TCAS.  These  estimates  ignore  the  effect  of  altimetry  bias.  The  remainder  of  this 
section  compares  several  metrics  of  Section  5.2  l)etween  the  DP  logic  and  TCAS. 


TABLE  6 

Probability  of  each  alerting  system  outcome. 


Outcome  category 

DP 

TCAS 

Correct  Rejection 

8.69  • 

10^^ 

4.89- 

10  ^ 

Correct  Detection 

2.88  • 

10-'^ 

2.93  • 

10^’^ 

F'alse  Alarm 

1.28- 

10*^ 

5.08  • 

10“^ 

Missed  Detection 

6.25  • 

lO'"’ 

0.00- 

10‘’ 

Induced  Conflict 

8.45  • 

10 

3.56  • 

10 

Late  Alert 

4.50  • 

10"'' 

6.08  • 

10-'^ 

5.4.1  Probability  of  alert 

The  probability  of  alert  can  be  approximated  using  importance  sampling  as  follows: 

(31) 

i—1 

where  N  is  the  number  of  encounters,  represents  the  ith  encounter,  and  f{uj)  is  a  function  that 
takes  as  input  an  encounter  and  returns  one  if  the  alerting  system  alerted  during  the  encounter 
and  zero  otherwise.  A  high  probability  of  alert  is  generally  undesirable  and  may  indicate  that  the 
system  may  be  alerting  more  often  than  necessary.  The  probability  of  alert  can  also  be  estimated 
by  summing  the  probabilities  of  correct  detection,  false  alarm,  induced  conflict,  and  late  alert  as 
given  in  Table  6. 

The  results  of  the  simulation  using  500,000  encounters  show  that  the  probability  of  alert 
for  TCAS  is  51.1%,  whereas  the  probability  of  alert  for  the  DP  logic  is  13.1%.  This  situation 
admits  several  possible  explanations.  As  i)reviously  discussed,  TCAS  alerts  frequently  due  to  the 
minimum  vertical  separation  requirements  that  it  tries  to  meet.  Additionally,  TCAS  has  a  wider 
spectrum  of  alerts  that  it  can  issue  (Section  1.2).  In  addition  to  climb  and  descend  RAs,  it  can 
issue  vertical  speed  limits,  increase  rate  RAs,  and  sense  reversals.  Notwithstanding,  TCAS  does 
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TABLE  7 


Risk  ratios  with  TCAS  v.  7.1  and  the  DP  logic. 


Logic 

Risk  Ratio 

Unresolved  Risk  Ratio 

Induced  Risk  Ratio 

TCAS 

5.54  •  10-^ 

2.89  •  itr^ 

2.64  •  10-^ 

DP 

1.36  - 10“  ‘ 

9.19  •  10“^ 

4.43  •  10"^ 

not  issue  RAs  against  intruders  for  which  a  large  horizontal  miss  distance  is  projected.  The  DP 
logic  does  not  account  for  such  circumstances  and  therefore,  although  it  alerts  less  than  TCAS,  still 
has  the  tendency  to  over-alert.  By  adding  a  miss  distance  filter  similar  to  that  which  is  included 
in  TCAS,  the  alert  rate  could  be  reduced  further. 

5.4.2  Probability  of  NMAC 

Using  importance  sampling,  the  probability  of  an  NMAC,  given  that  the  aircraft  are  involved 
in  an  encounter,  can  be  estimated  by 


1 

Pr(NMAC)  «  ^  Pr(NMAC  | 

2—1 


(32) 


where  Pr(NMAC  |  is  the  probability  that  the  zth  encounter  resulted  in  an  NMAC.  The 

probability  that  the  ith  encounter  resulted  in  an  NMAC  is  calculated  by  integrating  the  combined 
altimeter  error  distribution  of  the  aircraft  over  the  range  of  errors  that  would  cause  the  true  vertical 
separation  to  be  less  than  100ft  [79,83]. 

Without  any  collision  avoidance  system,  Pr(NMAC)  =  0.00311.  This  means  that  the  sim¬ 
ulation  indicates  that  one  in  approximately  321  close  encounters  results  in  an  NMAC  when  no 
collision  avoidance  system  is  equipped.  With  TCAS,  Pr(NMAC)  is  reduced  to  0.000172.  With  the 
DP  logic,  Pr(NMAC)  =  0.000424.  In  either  case,  the  risk  of  an  NMAC  is  significantly  reduced. 
The  probability  of  NMAC  with  the  DP  logic  is  approximately  twice  as  large  as  the  probability  of 
NMAC  with  the  existing  TCAS. 

It  is  encouraging  to  observe  that  the  probability  of  an  NMAC  using  the  DP  logic  is  comparable 
to  that  using  TCAS.  Decreasing  the  cost  of  alerting  will  decrease  the  probability  of  an  NMAC  with 
the  DP  logic  even  further.  Apart  from  this,  there  are  still  several  fundamental  ways  in  which  the 
DP  logic  can  be  substantially  improved,  including  extending  the  motion  model  used  to  compute 
the  logic,  accounting  for  the  effects  of  sensor  uncertainty,  and  expanding  the  number  of  alerts  that 
the  DP  logic  can  issue.  These  are  discussed  in  Section  5.6.  While  the  results  summarized  in  this 
report  are  indeed  promising,  they  are  still  preliminary  in  nature,  and  further  refinement  of  the  DF^ 
logic  is  expected  to  result  in  significant  performance  gains. 
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Number  of  encounters  •lO'^ 

Figure  24-  Risk  ratio  convergence. 


5.4.3  Risk  Ratios 

Sc'Ctioii  5.2  defined  the  risk  ratio  cis  the  ratio  between  the  probability  that  a  conflict  would 
occur  with  the  alerting  system  and  the  probability  that  a  conflict  would  occur  without  the  alerting 
system.  A  risk  ratio  of  zero  is  indicative  of  an  effective  collision  avoidance  system,  while  a  risk  ratio 
above  one  is  indicative  of  an  adverse  system.  If  a  conflict  is  defined  as  an  NMAC,  the  risk  ratio 
for  TCAS  is  0.0554,  while  the  risk  ratio  for  the  DP  logic  is  0.136.  Table  7  lists  the  risk  ratios  for 
TCAS  and  the  DP  logic  as  well  as  the  unresolved  and  induced  risk  ratio  components. 

These  estimates  of  risk  ratio  are  sensitive  to  the  number  of  samples  (encounters)  used  to 
calculate  them.  As  more  samples  are  used,  the  accuracy  of  the  estimates  increases.  Figure  24  shows 
the  convergence  of  the  risk  ratio  estimates  as  the  number  of  encounters  increases.  Empirically,  it 
appears  that  the  estimates  have  converged.  This  also  serves  to  show  that,  with  a  high  degree  of 
confidence,  the  risk  ratio  of  TCAS  is  lower  than  that  of  the  DP  logic.  Further  research  will  reveal 
the  extent  to  which  the  DF^  logic  can  be  improved. 

5.4.4  Vertical  Miss  Distance 

Recall  that  VMD  is  defined  as  the  (nonnegative)  vertical  separation  between  the  aircraft  at 
the  point  in  the  encounter  when  the  horizontal  miss  distance  is  minimal.  Fdgure  25  compares  the 
VMD  using  TCAS  and  VMD  using  the  DP  logic  for  all  500,000  weighted  simulated  encounters.  The 
figure  consists  of  several  regions.  Points  on  the  diagonal  line  represent  encounters  for  which  VMD 
using  TCAS  and  VMD  using  the  DP  logic  are  the  same.  Points  that  lie  below  the  line  correspond 
to  encounters  for  which  the  DP  logic  provides  greater  vertical  separation  than  TCAS.  Points  that 
lie  above  the  line  correspond  to  encounters  for  which  TCAS  provides  greater  vertical  separation 
than  the  DP  logic.  As  the  figure  shows,  most  of  the  time  DP  and  TCAS  produce  the  same  vertical 
separation,  which  is  because  neither  issues  an  alert.  When  at  least  one  system  alerts,  TCAS  is 
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3000 


VMD  (ft)  witli  DP  logic 


Figure  25.  Comparison  of  VMD  using  TCAS  v.  7.1  with  VMD  using  the  DP  logic.  The  relative  frequency  of 
the  various  cells  is  indicated  using  a  logarithmic  gray  scale.  Altimetry  bias  is  zero. 

more  than  twice  as  likely  to  provide  greater  vertical  separation.  Decreasing  the  cost  of  alerting  will 
result  in  a  DP  logic  that  will,  on  average,  increase  vertical  separation. 


5.5  EXAMPLE  ENCOUNTERS 

The  previous  sections  assessed  the  overall  performance  of  the  logic  in  terms  of  a  few  comprehensive 
metrics.  It  can  he  enlightening,  however,  to  closely  examine  the  logic  on  a  small  set  of  simulated 
encounters.  This  section  analyzes  two  simulated  encounters:  one  in  which  the  DP  logic  prevents  an 
NMAC  that  TCAS  fails  to  prevent  and,  conversely,  one  in  which  TCAS  prevents  an  NMAC  that 
the  DP  logic  fails  to  prevent.  Approximately  0.008%  of  encounters  fall  under  the  former  category, 
while  approximately  0.011%  fall  under  the  latter  category. 

Figure  26  shows  the  horizontal  and  vertical  profiles  of  a  simulated  encounter  in  which  the  DP 
logic  prevents  an  NMAC  that  the  TCAS  fails  to  prevent.  Both  the  own  aircraft  and  the  intruder 
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(b)  Vertical  profile. 


Figure  26.  Simulated  encounter  in  which  the  DP  logic  prevents  an  NMAC  that  TCAS  fails  to  prevent. 


are  initially  flying  level.  After  22  seconds,  own  aircraft  begins  to  climb,  which  causes  TCAS  to 
detect  a  threat  and  issue  a  Do  Not  Climb  VSL  three  seconds  later.  However,  even  though  the  own 
aircraft  is  comi)liant  with  the  RA  and  the  intruder  remains  level,  an  NMAC  occurs.  The  I)H  logic, 
on  the  other  hand,  prevents  an  NMAC  by  issuing  a  descend  advisory  23  seconds  in,  which  remains 
in  effect  for  the  remainder  of  the  encounter. 

Analysis  of  other  encounters  of  this  type  suggests  other  reasons  why  the  DP  logic  performs 
better  than  TCAS  in  select  encounters: 


•  TCAS  issues  corrective  RAs  later  than  is  necessary  in  preventing  an  NMAC,  while  the  DP 
logic  alerts  earlier; 

•  TCAS,  being  strongly  biased  against  altitude-crossing  RAs,  issues  non-altitude-crossing  RAs 
that  result  in  NMACs,  while  the  DP  logie  makes  no  such  distinction;  and 

•  TCAS  either  fails  to  alert  or  reverses  the  sense  of  an  RA  but,  nonetheless,  the  encounter 
results  in  an  NMAC. 


Figure  27  shows  another  simulated  encounter  where  TCAS  prevents  an  NMAC  but  the  DP 
logic  fails.  The  intruder  climbs  at  approximately  1800  ft /min  for  27  seconds.  Twenty-one  seconds 
into  the  encounter,  while  the  intruder  is  still  climbing,  the  DP  logic  advises  own  aircraft  to  descend 
1500ft/niin.  This  is  because  the  expected  cost  of  climbing  or  remaining  level  is  greater  than  the 
expected  cost  of  descending.  After  the  five-second  pilot  delay,  own  aircraft  begins  descending  and 
continues  to  descend  for  the  remainder  of  the  encounter.  An  NMAC  occurs,  however,  because  the 
intruder  begins  to  level  off  as  own  aircraft  is  descending. 
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(a)  Horizontal  profile.  (b)  Vertical  profile. 

Figure  27.  Simulated  encounter  in  which  TCAS  prevents  an  NMAC  that  the  DP  logic  fails  to  prevent. 


This  encounter  evolved  into  an  NMAC  because,  although  the  alert  to  descend  150()ft/min 
seemed  reasonable  when  the  intruder  was  climbing,  this  alert  was  inappropriate  when  the  intruder 
began  leveling  off.  Similar  to  the  DP  logic,  TCAS  issues  an  altitude-crossing  descend  RA,  which 
is  then  changed  to  a  non-altitude-crossing  descend  RA.  However,  as  the  encounter  progresses,  the 
RA  is  strengthened  to  an  increase  descent  RA,  allowing  the  aircraft  to  cross  safely.  Currently  no 
provisions  have  been  made  in  the  DP  logic  to  change  the  RA  when  the  evolution  of  the  encounter 
requires  it.  Maintaining  the  vertical  rate  that  the  DP  logic  indicates  for  the  remainder  of  the 
encounter  could  lead  to  unsafe  encounters.  Extending  the  DP  logic  to  include  strengthening  and 
reversals  is  discussed  in  Section  7.3. 


5.6  DISCUSSION 

This  section  showed  that  the  DP  logic  performed  relatively  well  compared  to  the  existing  TCAS 
logic.  However,  the  current  implementation  of  the  DP  logic  was  at  a  disadvantage  against  the 
TCAS  logic  in  the  three-dimensional  simulations  for  several  reasons: 

1.  Motion  model:  The  DP  logic  is  calculated  using  a  model  that  only  captures  motion  in  the 
vertical  plane.  The  motion  in  the  horizontal  plane  is  important  to  model  because  it  affects 
the  time  to  potential  conflict.  Expanding  the  vertical  motion  model  to  capture  horizontal 
dynamics  is  discussed  in  Section  7.1. 

2.  Sensor  uncertainty:  The  DP  logic  made  decisions  assuming  that  the  tracked  sensor  mea¬ 
surements  were  correct.  However,  due  to  noise  in  the  sensor  measurements,  making  decisions 
based  on  this  assumption  can  lead  to  unnecessary  or  unsuccessful  alerts.  The  sensor  error 
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model  should  be  used  to  estimate  the  state  uncertaiiity,  which  can  be  used  by  the  QMDl^ 
value  method  discussed  in  Section  2.4. 

3.  Action  space:  TCAS  allows  RAs  to  be  strengthened  and  reversed,  whereas  the  DP  logic 
must  commit  to  a  particular  RA  once  it  is  issued  for  the  remainder  of  the  encounter.  Therefore 
it  is  not  surprising  that  TCAS  resolves  more  NMACs  than  the  DP  logic.  Section  7.3  discusses 
how  additional  RAs  might  be  included  in  the  DP  logic. 

All  three  of  these  issues  will  be  addressed  in  further  work  and  should  result  in  a  significant  improve¬ 
ment  in  performance.  Further  work  will  also  investigate  other  operating  points  in  the  CASSATT 
simulation  for  the  DP  logic. 
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6.  ALTERNATIVE  APPROACHES 


Kuchar  and  Yang  provide  a  thorough  survey  of  conflict  detection  and  resolution  methods  [33]. 
Tliis  section  discusses  some  of  the  key  approaches  they  identified  as  well  as  other  approaches 
introduced  since  their  survey.  A  variety  of  other  algorithms  have  been  suggested  in  the  literature, 
but  the  majority  are  variations  of  one  of  the  general  approaches  discussed  in  this  section- -or  are 
l)fxsed  on  the  dynamic  programming  or  conflict  probability  estimation  approaches  discussed  earlier. 


6.1  POTENTIAL  FIELD  METHODS 

r^oteiitial  fields  have  been  a  very  popular  method  for  collision  avoidance  in  robotics  siine  they 
were  introduced  over  thirty  years  ago  [84-86]  and  have  been  applied  to  collision  avoidance  for 
aircraft  [87,88].  The  typical  potential  field  works  by  exerting  virtual  forces  on  the  aircraft,  usually 
an  attractive  one  from  the  goal  or  waypoint  and  repelling  ones  from  nearby  traffic.  The  approach 
is  very  simple  to  describe  and  easy  to  implement,  but  it  has  some  fundamental  problems  [89]. 

One  problem  with  potential  field  methods  is  that  it  is  a  greedy  strategy  that  is  subj(K  t  to 
local  minima,  which  can  lead  to  the  aircraft  getting  ‘"stuck.”  For  example,  the  sum  of  the  repulsive 
forces  from  the  traffic  can  cancel  the  attractive  force  from  the  goal,  resulting  in  zero  net  potential. 
The  problems  with  local  minima  have  led  to  the  development  of  heuristics  on  top  of  the  potential 
fields  to  escape  such  traj)s  and  eventually  to  the  construction  of  navigation  functions,  which  are 
potential  fields  with  unique  minima.  These  navigation  functions  are  global  and  are  equivalent  to 
value  functions  [90]. 

Another  issue  with  potential  fields  is  that  they  do  not  take  into  account  the  probabilistic 
dynaniics  of  the  intruders.  The  potential  fields  are  typically  defined  in  two  or  three  spatial  dimen¬ 
sions  without  taking  into  consideration  the  aircraft  dynamics.  If  the  aircraft  are  moving  at  different 
speeds,  large  virtual  forces  are  needed  to  repel  the  traffic.  If  the  forces  are  too  small,  a  collision 
can  result  with  a  fast  intruder.  If  the  forces  are  too  large,  slower  aircraft  will  deviate  unnecessarily 
from  their  intended  course.  These  dynamic  problems  are  often  overlooked  for  slow-moving  mobile 
robots,  but  they  are  more  significant  in  aircraft. 

Perhaps  the  most  signific'ant  problem  with  potential  fields  in  the  (‘ontext  of  last-minute  col¬ 
lision  avoidance  is  that  they  do  not  account  for  uncertainty  in  control  or  observation.  Because  of 
the  short  time  frame  and  the  catastrophic  nature  of  collision,  decisions  must  be  robust  to  noise  in 
the  sensor  measurements  and  unexpected  deviation  from  the  projected  flight  course. 


6.2  RAPIDLY  EXPANDING  RANDOM  TREES 

Rapidly  expanding  random  trees  (RRTs)  were  originally  developed  in  the  context  of  robot  naviga¬ 
tion  [91].  RRTs  explore  the  configuration  space  of  the  robot  by  generating  random  samples  and 
connecting  them  to  one  or  more  trees  rooted  at  the  origin  and  at  the  destination.  RRTs  exhibit 
what  is  known  as  Voronoi-bias,  which  allows  them  to  sample  preferentially  in  previously  unexplored 
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areas,  and  so  they  tend  to  cover  the  space  more  quickly  than  a  random  walk.  Typically,  RRTs  find 
paths  from  the  start  to  the  goal  long  before  the  space  is  densely  sampled. 

RRTs  have  been  remarkably  successful  in  practice.  They  have  become  the  solution  of  choice 
for  liigh-dimensional  robot  motion  planning  problems  and  for  kino-dynamic  problems  that  involve 
not  just  positions  but  velocities  [92].  The  MIT  autonomous  ground  vehicle  that  completed  the 
DARPA  Urban  Challenge  used  RRTs  to  successfully  drive  in  lanes,  execute  three-point  turns,  park, 
and  maneuver  with  other  traffic  and  obstacles  [93].  RRTs  have  also  been  used  for  obstacle  avoidance 
for  micro  air  vehicles  [94]. 

To  balance  their  many  desirable  properties,  it  is  important  to  note  that  RRTs  make  no 
guarantee  of  optimality  of  the  path  they  find.  In  practice,  the  paths  found  by  RRTs  can  be  quite 
poor  due  the  fact  that  they  are  constructed  by  joining  together  random  samples.  RRT-based 
motion  planners  typically  go  through  a  second  phase  of  smoothing  the  resulting  path  to  get  one 
that  is  acceptable,  but  generally  far  from  optimal  [95].  The  other  limitation  of  RRTs  is  that  they 
do  not  consider  uncertainty,  which  is  critical  to  airborne  collision  avoidance.  There  is  some  current 
research  aimed  at  extending  RRTs  for  solving  MDPs  and  situations  where  the  environment  model 
is  uncertain,  but  the  results  are  preliminary  [96]. 


6.3  GEOMETRIC  OPTIMIZATION 

Bilimoria  introduced  a  lateral  conflict  resolution  algorithm  based  on  geometric  optimization  [28]. 
Given  position  and  velocity  information,  the  algorithm  uses  straight-line  projection  to  determine 
whether  the  own  aircraft  will  penetrate  a  circular  protected  zone  of  an  intruder.  If  a  conflict  is 
predicted  to  occur,  the  algorithm  will  compute  the  minimal  change  in  velocity  required  to  avoid  the 
protected  area  of  an  intruder.  The  assumption  of  constant  velocity  allows  the  algorithm  to  quickly 
compute  an  analytical  solution.  KB3D  is  a  generalization  of  this  algorithm  to  three-dimensional 
space  where  cylinders  define  the  protected  zones  of  intruders  [29].  The  algorithm  outputs  the 
minimal  change  in  airspeed,  vertical  rate,  or  heading  required  to  prevent  the  aircraft  from  entering 
a  protected  zone. 

These  geometric  optimization  approaches  typically  lend  themselves  to  fast,  analytical  solu¬ 
tions.  Because  they  recommend  the  minimal  velocity  change,  the  projected  flight  path  (assuming 
constant  velocity)  will  always  barely  miss  the  edge  of  the  protected  zone.  If  this  approach  is  to 
be  used  for  collision  avoidance  with  real  aircraft,  the  protected  zone  would  need  to  be  enlarged 
to  make  the  system  robust  to  flight  path  variability.  Future  work  can  investigate  the  SOC  curve 
traced  when  varying  the  size  of  the  protected  zone.  It  would  be  interesting  to  compare  how  close 
a  geometric  optimization  curve  approaches  those  in  Section  5. 


6.4  POLICY  SEARCH 

One  approach  is  to  choose  a  policy  representation  whose  behavior  is  defined  by  a  vector  of  parame¬ 
ters  6.  The  choice  of  parameters  requires  insight  into  the  structure  of  the  problem  and  engineering 
judgment.  The  choice  of  policy  representation  is  flexible;  for  example,  one  could  use  pseudocode 
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resembling  the  current  version  of  TCAS  or  a  neural  network.  Local  or  global  search  methods  can 
search  the  space  of  parameters  for  an  optimal  Q  that  minimizes  J(0),  which  is  the  expected  cost 
of  following  the  policy  determined  by  Q  assuming  some  distribution  over  initial  states.  One  advan¬ 
tage  of  policy  search  is  that  the  state  space  does  not  require  discretization,  therefore  allowing  the 
approach  to  better  scale  to  higher  dimensional  problems. 

A  local  search  method  known  as  policy  gradient  starts  with  some  initial  parameter  vector 
00  ^nd  estimates  the  gradient  VoJ{Oq)  using  sampling.  The  next  parameter  vector  6\  is  chosen 
by  taking  a  small  step  in  the  direction  of  the  gradient,  and  the  process  continues  until  reaching  a 
local  minimum  [97].  Policy  gradient  has  been  applied  to  aircraft  collision  avoidance  [52,73]  and 
autonomous  helicopter  flight  [98],  among  other  domains. 

The  gradient  descent  approach  will  not  necessarily  find  the  globally  optimal  {)araineter  vector 
due  to  local  minima.  Various  methods  have  been  suggested  to  make  local  search  more  robust  to 
local  minima,  such  as  tabu  search  [99]  and  simulated  annealing  [100].  Another  approach  is  to 
use  genetic  algorithms  or  some  other  evolutionary  search  method  to  find  a  policy  that  minimizes 
J{0)  [101-103].  Durand,  Alliot,  and  Medioni  demonstrated  how  genetic  algorithms  can  be  used  to 
find  a  neural  network  controller  to  resolve  lateral  conflicts  between  aircraft  [104]. 

The  computation  involved  in  policy  search  can  be  done  offline  by  powerful  computers,  as 
oi)posed  to  online  by  the  collision  avoidance  system  during  flight.  The  “fitness  evahiatioir’  stage 
of  evolutionary  algorithms  can  be  parallelized,  and  local  search  can  be  conducted  from  a  large 
collection  of  initial  points  in  the  parameter  space  in  parallel.  However,  adequately  estimating  J{0) 
or  its  gradient  requires  many  samples,  even  when  using  importance  sampling,  which  can  make  the 
I)ace  of  the  search  very  slow.  Finding  optimal  solutions  may  be  faster  using  a  dynamic  programming 
formulation  (Section  3)  because  it  explicitly  leverages  models  of  aircraft  dynamics  and  sensor  error. 


6.5  NONSTATIONARY  APPROXIMATE  POLICY  ITERATION 

Another  approach  is  to  search  a  policy  in  stages,  working  backwards  from  the  terminal  states 
(i.e.,  when  there  is  a  collision  or  there  is  negligible  probability  of  conflict  occurring).  The  learning 
problem  at  each  stage  is  treated  as  a  supervised  multiclass  classification  problem,  which  can  leverage 
any  classifier  suited  for  the  domain.  This  kind  of  approach,  kncjwn  as  nonstationary  approximate 
policy  iteration  [105]  or  policy  search  by  dynamic  programming  [106],  has  been  applied  by  Kaelbing 
and  Lozano-Perez  to  a  hypothetical  collision  avoidance  scenario  similar  to  the  one  explored  in  this 
report  [52].  The  policy  generated  using  this  approach  is  called  nonstationary  because  it  is  time 
dependent.  Instead  of  being  defined  by  a  simple  mapping  tt  from  states  to  actions,  the  nonstationary 
pcflicies  are  defined  by  a  sequence  of  mappings  (tti,  . . . ,  iTj^)  from  states  to  actions,  where  tt^  dictates 
the  action  to  execute  i  steps  before  termination. 

The  algorithm  suggested  by  Kaelbling  and  Lozano-Perez  begins  by  simulating  a  large  collec¬ 
tion  of  trajectories  through  the  state  space  until  they  reach  a  terminal  state.  All  of  the  states  from 
the  trajectories  are  partitioned  into  a  collection  of  sets,  Dq,  . . . ,  where  Dj  contains  all  the  states 
i  steps  from  termination. 
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The  following  procedure  is  used  to  define  tti.  For  each  state  in  Di,  the  expected  cost  for 
executing  each  action  is  estimated  using  sampling  (Appendix  D).  Each  state  is  paired  with  the 
action  that  resulted  in  the  lowest  expected  cost.  This  set  of  state-action  pairs  serves  as  a  training 
set  for  a  multiclass  classification  algorithm  [61,62],  resulting  in  tti  that  maps  states  to  actions. 

Once  TTj  is  known,  TTj^i  may  be  computed  as  follows.  For  each  state  in  Di+i,  the  expected 
cost-to-go  for  executing  each  action  is  estimated  by  sampling  the  next  state  and  then  following  the 
nonstationary  policy  defined  by  (tti,  . . . ,  TTf).  Each  state  is  paired  with  the  action  that  resulted  in 
the  lowest  expected  cost-to-go.  This  set  of  state-action  pairs  serves  as  a  training  set  for  a  miilticlass 
classification  algorithm,  resulting  in  7rj-|_i  that  maps  states  to  actions. 

As  with  policy  search,  this  approach  does  not  attempt  to  represent  the  cost-to-go  function, 
which  may  allow  it  to  scale  to  more  complex  problems.  The  policy  can  be  compactly  represented 
if  the  series  of  classifiers  tti,  . . . ,  are  defined  by  a  series  of  relatively  short  parameter  vectors 
01, ... ,  0/e.  The  success  of  this  algorithm  depends  upon  the  choice  of  classifier,  sampling  scheme, 
and  the  number  of  training  examples. 

6.6  DISCUSSION 

Perhaps  the  primary  strength  of  approaches  based  on  dynamic  programming  and  probabilistic  con¬ 
flict  estimates  is  that  they  directly  account  for  uncertainty  in  the  future  path  of  the  aircraft.  Of  the 
approaches  discussed  in  this  section,  only  the  policy  search  and  nonstationary  approximate  policy 
iteration  methods  leverage  models  of  uncertainty.  For  last-minute  collision  avoidance,  accounting 
for  sensor  error  and  dynamic  uncertainty  is  critically  important.  Many  of  the  algorithms  suggested 
in  the  literature  that  are  based  on  the  other  approaches  are  intended  to  assist  in  air  traffic  control 
separation,  where  accounting  for  these  uncertainties  is  less  important  because  of  the  larger  time 
scale  involved. 
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7.  FURTHER  RESEARCH 


Tliere  are  several  ways  in  which  the  simple  collision  avoidance  model  needs  to  be  extende^d 
to  handle  realistic  encounter  scenarios.  This  section  outlines  how  the  existing  model  can  be  ex¬ 
tended  and  discusses  potential  computational  issues  that  may  arise  and  how  to  address  them.  The 
main  priorities  for  further  research  in  the  short  term  include  extending  the  model  to  three  spatial 
dimensions  and  equipped  intruders,  investigating  model  robustness,  and  exploring  different  policy 
representations  and  how  they  may  impact  certification. 


7. 1  THREE-DIMENSIONAL  DYNAMICS 

One  way  to  extend  the  hypothetical  c*ollision  avoidance  model  (Section  1.3)  into  three  diineiisions 
is  to  dispense  of  r  (time  to  horizontal  conflict)  and  add  the  following  variables: 

•  horizontal  range  to  intruder  r, 

•  bearing  of  intruder  y, 

•  own  grouml  speed  iq, 

•  intruder  ground  speed  j'2, 

•  relative  Invading  of  intruder  /3, 

•  own  turn  rate  0i,  and 

•  intruder  turn  rate  02* 

A  point  in  the  state  space  would  then  be  an  11-dimensional  tuple 

(/(.,  r,  Y.  1.  /l2,  Vi,V2,  p,  Sra), 

which  uniquely  specifies  (in  a  convenient  coordinate  system)  the  velocity  vectors  of  both  aircraft, 
the  relative  position  of  the  intruder,  and  the  RA  response  state.  A  probabilistic  model  can  be  used 
to  specify  tlie  dynamics.  Unfortunately,  applying  the  dynamic  programming  approach  suggested 
in  Section  3  might  be  impractical  because  of  the  dimensionality  of  the  state  space.  A  grid  with  21 
('dges  per  dimension  would  result  in  3.5  *  lO^'^  discrete  states. 

One  way  to  reduce  the  dimensionality  of  the  state  space  is  to  decompose  the  problem  into 
motion  along  the  horizontal  plane  and  motion  along  the  vertical  plane.  The  dynamics  in  the 
horizontal  plane  can  be  modeled  as  an  uncontrolled  Markov  process  over  the  variables  r,  iq,  V2, 
/?,  01,  and  02-  The  dynamics  in  the  vertical  plane  would  be  modeled  as  discussed  in  Section  1.3 
over  the  variables  /?,  r,  hi,  /12,  and  sra-  Ti*ansformiiig  the  11-dimensional  problem  into  these  two 
problems  of  smaller  dimension  requires  that  the  vertical  rates  and  vertical  separation  be  independent 
of  the  behavior  in  the  horizontal  plane.  Prior  TCAS  studies  have  made  this  assumption  [15,76]. 
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Figure  28.  Estimation  of  time  to  horizontal  conflict. 

A  distribution  over  r  can  be  inferred  from  the  uncontrolled  Markov  process  over  Xhoriz  = 
(r,  x, z;i,  1^2, /?,  ^1,  02)-  One  way  to  estimate  this  distribution  is  to  sample  a  large  collection  of 
trajectories  starting  from  the  current  state  (Figure  28).  The  variable  r  may  be  discretized  to  one- 
second  intervals  up  to  the  time  horizon  ^max-  [b,  1),  [1,  2), . . . ,  [tmax,  oc).  A  histogram  over  these  bins 
can  be  inferred  from  the  samples.  As  with  conflict  probability  estimation  (Section  4),  importance 
sampling  techniques  can  result  in  improved  estimates  of  the  distribution  Pr(T  |  Xhoriz)- 

Another  way  to  estimate  Pr(T  |  Xhoriz)  is  to  use  dynamic  programming,  whicli  may  be  done 
offline  for  the  entire  discrete  state  space.  Appendix  G  provides  details  of  an  algorithm.  During 
execution,  computing  Pr(T  \  Xhoriz)  for  arbitrary  Xhoriz  would  involve  interpolating  (Appendix  C) 
distributions  at  nearby  states.  If  the  dimensionality  of  Xhoriz  proves  to  be  computationally  bur¬ 
densome,  it  may  be  worth  assuming  constant  ground  speed  and  turn  rate  to  bring  the  number  of 
dimensions  down  to  three  from  seven.  Of  course,  dynamic  programming  would  then  need  to  be  run 
online. 

Once  Pr(T  |  Xhoriz)  is  known,  the  QMDP  value  method  (Section  2.4)  can  be  used  to  choose 
the  best  action 

argmiiiy'Pr(T  j  Xhoriz)>/*((/i,  t,  Ai, /12,  sra),  «),  (33) 

a 

T 

where 

J*(5,  a)  =  c{s,  a)  -h  ^  Pr{s'  \  s,  a)J'^{s).  (34) 

s' 

When  T  >  bnax7  is  set  to  zero. 

7.2  NONDETERMINISTIC  PILOT  RESPONSE 

Kuchar  and  Drumm  analyzed  TCAS  monitoring  data  in  the  Boston  area  from  June  2005  to  January 
2006,  and  they  observed  the  following: 
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Figure  29.  Transition  model  for  sra  where  the  response  delay  is  given  by  a  geometric  distribution. 


Examination  of  the  data  . . .  indicates  that  only  13%  of  pilot  responses  met  the  assimip- 
tion  used  by  TCAS:  pilot  responses  within  5  seconds  and  achieving  a  1500  ft/min  vertical 
rate.  In  63%  of  the  cases,  the  pilots  maneuvered  in  the  proper  direction  but  were  not  as 
aggressive  or  prompt  as  TCAS  assumed.  Pilots  maneuvered  in  the  opposite  direction 
to  the  RA  in  24%  of  the  cases.  Some  of  these  opposite  responses  are  believed  to  be  due 
to  visual  acquisition  of  the  intruder  aircraft  and  the  pilot’s  decision  that  following  the 
RA  was  not  necessary.  [4] 

In  the  hypothetical  collision  avoidance  problem,  the  pilot  of  the  own  aircraft  always  responds 
deterministically  to  RAs,  which  reduces  the  amount  of  uncertainty  tlie  collision  avoidance  system 
must  account  for  when  deciding  when  to  issue  an  RA.  If  the  model  allows  for  late  or  op[)osite 
responses,  then  the  optimal  logic  will  likely  issue  alerts  earlier. 

The  transition  probabilities  for  the  sra  variable  would  need  to  be  altered  to  allow  for  nonde- 
terministic  responses.  For  example,  instead  of  transitioning  deteriiiinistically  from  clear  of  conflict 
to  climb  in  4s  when  issuing  a  climb  RA,  the  model  might  allow  a  transition  to  climb  in  Is  or  climb 
in  10  s  or  even  never  climb  with  some  probability,  for  example. 

Alternatively,  the  transition  probabilities  for  the  sra variable  can  be  as  shown  in  Figure  29. 
Once  a  climb  or  descend  advisory  is  issued,  sra  will  go  into  a  “holding  state''  (either  will  climb 
or  will  descend)  from  which  it  will  transition  to  climbing  or  descending  with  some  probability  p  at 
each  time  step.  The  distribution  over  the  time  spent  in  the  holding  state  is  given  by  a  geometric 
distribution  with  mean  1  /p.  The  mean  total  delay,  which  includes  the  step  from  clear  of  conflict 
to  one  of  the  holding  states,  is  1  +  1/p.  The  probability  p  can  be  chosen  to  provide  the  desired 
average  delay.  For  example,  p  =  0.25  provides  a  5  s  delay  on  average.  One  helpful  byproduct  of  this 
approach  is  that  it  reduces  the  size  of  the  state  sjmce  since  explicit  states  are  no  longer  required 
for  delay  in  4s,  ... ,  delay  in  Is  (as  in  Appendix  A). 

7.3  STRENGTHENING  AND  REVERSAL  ADVISORIES 

The  hypothetical  collision  avoidance  problem  in  this  report  does  not  allow  the  system  to  change 
the  RA  once  it  has  been  issued.  However,  it  is  critical  that  TCAS  be  able  to  either  strengthen  or 
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reverse  RAs  as  the  encounter  develops  because  pilots  often  do  not  follow  their  RAs  exactly  or  even 
at  all.  According  to  Kiichar  and  Driimm  [4],  the  mid-air  collision  of  a  Russian  Tu-154  and  a  DHL 
B-757  over  Uberlingen  in  2002  may  have  been  averted  if  TCAS  had  properly  reversed  the  RA  it 
had  issued  to  the  DHL  aircraft. 

The  current  version  of  TCAS  incorporates  reversal  logic.  According  to  TCAS  monitoring 
data  obtained  from  seven  sensors  since  2008,  approximately  1%  of  RA  enounters  involve  reversals. 
The  logic  in  TCAS  responsible  for  RA  changes  is  rather  complex  in  order  to  address  issues  iden¬ 
tified  in  simulation  and  during  operational  use.  Modeling  collision  avoidance  as  an  optimization 
problem  may  result  in  logic  that  is  more  robust  to  unexpected  events  without  requiring  substantial 
engineering  effort  and  the  development  of  complex  logic. 

The  MDP  framework  can  be  extended  to  allow  strengthening  and  reversal.  The  behavior 
of  the  .sra  variable  (Figure  A-1)  would  require  adjustment  and  the  number  of  actions  available 
from  non-clear-of-conflict  states  would  need  to  be  expanded.  The  cost  function  may  also  need 
adjustment,  depending  on  whether  it  is  desirable  to  penalize  strengthening  or  reversing  RAs. 

The  approach  that  involves  making  decisions  based  on  the  probability  of  conflict  (Section  4) 
can  also  l)e  extended  to  allow  strengthening  and  reversal.  If  it  is  determined  that  a  different  RA 
provides  a  lower  probability  of  conflict  than  the  current  RA,  then  the  system  would  change  the  RA. 
Experiments  would  be  able  to  reveal  how  well  this  approach  compares  to  the  MDP  framework. 


7.4  EQUIPPED  INTRUDERS 

This  report  has  only  considered  intruders  that  are  not  equip{)ed  with  a  collision  avoidance  system. 
However,  a  future  version  of  TCAS  will  have  to  interact  with  intruders  equipped  with  the  future  and 
legacy  versions  of  TCAS.  Separate  sublogics  for  each  equipage  category  can  be  optimized  according 
to  different  dynamic  models. 

Applying  dynamic  programming  to  problems  with  an  intruder  with  the  current  version  of 
TCAS  may  involve  adding  additional  information  to  the  state  representation  and  modifying  the 
transition  model.  However,  for  an  intruder  with  this  future  collision  avoidance  system,  the  logic 
needs  to  be  jointly  optimized  on  both  aircraft. 

One  way  to  jointly  optimize  the  logic  on  both  aircraft  is  to  define  the  state  variables  relative 
to  the  aircraft  with  the  low^er  Mode  S  address.  (The  current  version  of  TCAS  uses  Mode  S  address 
to  break  ties  in  coordinated  encounters.)  The  action  space  would  be  the  set  of  possible  pairs 
of  resolution  advisories  for  l)oth  aircraft:  {(no  alert,  no  alert),  (climb,  descend), .. .}.  The  dynamic 
model  would  take  into  account  the  RAs  of  both  aircraft.  The  cost  function  would  be  similar  to 
before,  except  that  the  cost  imposed  for  alerting  may  depend  on  whether  one  or  both  aircraft  alert. 
It  may  be  desirable  to  enforce  the  constraint  that  both  aircraft  alert  simultaneously.  The  optimal 
joint  policy  can  be  found  using  dynamic  programming. 

When  an  aircraft  equipped  with  this  future  system  discovers  an  intruder  with  the  same  system, 
it  will  estimate  the  state  relative  to  the  aircraft  with  the  lower  Mode  S  address.  It  will  then  apply 
the  policy  computed  using  dynamic  programming  to  determine  the  best  pair  (ai,a2).  If  the  own 


68 


aircraft  has  the  lower  Mode  S  address,  it  will  execute  action  ai;  otherwise,  it  will  execute  action 
(12-  Ideally,  both  aircraft  will  have  estimated  the  same  exact  state,  agreed  upon  the  same  action 
pair,  and  executed  their  respective  actions.  However,  it  is  possible  that  sensor  noise  results  in 
the  two  aircraft  believing  they  are  in  different  states,  which  may  result  in  disagreenieiit  over  the 
action  pairing.  One  way  to  mitigate  this  proldem  is  to  have  the  aircraft  with  the  lower  Mode  S 
address  dictate  the  action  of  both  aircraft  over  the  data  link.  The  current  version  of  TCAS  uses  a 
similar  coordination  scheme,  but  further  research  would  be  required  to  assess  potential  issues  and 
vulnerabilities  of  such  an  approach. 

The  current  version  of  TCAS  provides  a  much  lower  risk  ratio  when  both  aircraft  are  ecpiipped 
with  TCAS  [78].  It  would  be  interesting  to  determine  through  Monte  Carlo  simulation  how  much 
better  a  jointly  optimal  logic  performs  when  both  aircraft  follow  their  RA.  It  would  also  be  inter¬ 
esting  to  determine  how  performance  degrades  wdien  intruders  do  not  follow  their  RA. 


7.5  MODEL  ROBUSTNESS  ANALYSIS 

The  methods  discussed  in  this  report  assume  that  the  internal  model  of  the  system  dynamics  and 
sensor  performance  is  correct.  Before  de[)loyiiig  a  collision  avoidaiu^e  system  that  is  optiini'z('d  for 
a  [)articular  model,  it  is  important  to  assess  the  risk  of  an  inac  curate  model.  Monte  Carlo  analysis 
across  a  spectrum  of  gradually  different  models  can  reveal  how'  inaccmrate  the  internal  model  must 
l)e  before  performance  is  significantly  degraded. 


7.6  NON-GAUSSIAN  DYNAMICS 

The  dynamic  model  used  in  this  report  assumes  that  accelerations  are  chosen  according  to  a  Gaus¬ 
sian  distribution.  However,  the  logic  is  evaluated  using  an  encounter  model  with  much  more 
sophisticated  dynamics  represented  by  a  dynamic  Bayesian  network  derived  from  radar  data.  It 
may  be  worth  experimenting  with  how  much  the  policy  improves  if  a  more  sophisticated  dynamic 
model  is  used  for  planning.  If  there  is  little  to  gain,  then  it  may  be  wise  to  continue  using  the 
siini)ler  Gaussian  model.  However,  if  there  is  much  to  gain,  then  it  could  be  worth  adopting  a  more 
complex  model. 


7.7  ADDITIONAL  RESOLUTION  ADVISORIES 

TCAS  HI,  which  is  no  longer  under  development  and  has  not  been  deployed,  was  intended  as  a 
more  capable  version  of  TCAS  II  that  incorporated  horizontal  RAs  such  as  “turn  left."'  Experiments 
in  the  late  1980s  indicated  that  horizontal  RAs  were  expected  to  provide  a  modest  gain  in  safety 
while  significantly  lowering  the  alert  rate  [107].  A  more  recent  study  showed  that  horizontal  RAs 
can  make  TCAS  more  efficient  in  some  situations,  such  as  crossing  situations  that  can  result  in 
reversal  RAs  [108].  Pilots  generally  view  the  concept  of  horizontal  RAs  favoral)ly,  although  the 
views  of  controllers  tend  to  be  less  favorable  [108]. 
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Introducing  horizontal  RAs  to  the  MDP  formulation  would  require  expanding  the  model  into 
three  spatial  dimensions.  An  MDP  would  be  required  to  model  the  horizontal  dynamics  instead 
of  the  uncontrolled  Markov  process  discussed  in  Section  7.1.  The  action  space  would  need  to  he 
expanded  to  include  horizontal  RAs.  The  computational  impact  may  be  significant,  but  simulation 
using  an  encounter  model  could  quantify  the  anticipated  benefit  of  horizontal  RAs. 


7.8  LEVERAGING  INTENT  INFORMATION 

Automatic  Dependent  Surveillance-Broadcast  (ADS-B)  enables  aircraft  to  broadcast  their  position 
and  other  relevant  information  to  the  ground  and  nearby  traffic.  It  is  expected  that  a  future  version 
of  TCAS  will  leverage  this  technology  (at  least  to  some  degree)  to  improve  collision  avoidance. 
Because  ADS-B  can  broadcast  intent  information,  it  may  be  possible  to  reduce  the  uncertainty 
in  the  future  state  of  the  aircraft,  thereby  lowering  the  false  alert  rate  [109],  Since  pilots  do  not 
always  follow  their  intended  flight  plan,  the  dynamic  model  used  for  planning  must  capture  intent 
deviation  [110].  Experiments  can  confirm  whether  leveraging  intent  information  significantly  lowers 
the  rate  of  unnecessary  alerts. 


7.9  MULTIPLE  INTRUDERS 

As  to  l)e  expected,  the  multi-threat  logic  is  perhaps  one  of  the  most  complex  components  of  TCAS. 
This  part  of  the  logic  has  not  been  the  focus  of  the  same  rigorous  testing  as  the  single-threat  logic, 
due  in  pavt  to  the  rarity  of  multi-threat  events.  It  was  not  until  recently  that  an  encounter  model 
based  on  operational  data  was  developed  for  multi-threat  situations  [111]-  Preliminary  results 
indicate  that  the  current  version  of  TCAS  performs  reasonably  well  against  multiple  intruders. 

Expanding  the  MDP  model  to  capture  the  behavior  of  multiple  intruders  would  significantly 
increase  the  dimensionality  of  the  state  space.  The  fitted  value  iteration  approach  with  grid-based 
interpolation  of  the  value  function  (Section  3)  is  unlikely  to  scale  well  to  multiple  intruders  unless 
some  approximations  are  made.  Methods  that  use  conflict  probability  estimates  (Section  4)  or  other 
online  methods  will  likely  scale  better  in  multi-threat  situations.  Although  multi-threat  situations 
contribute  little  to  the  overall  risk  associated  with  TCAS  because  of  their  rarity,  it  is  important  to 
ensure  that  any  future  logic  be  robust  to  encounters  with  multiple  intruders. 


7.10  LOGIC  REPRESENTATION 

This  report  has  focused  on  how  to  compute  the  optimal  policy,  but  it  has  not  dealt  with  the  issue 
of  how  the  optimal  policy  will  be  used  or  represented.  One  option  is  to  not  use  the  optimal  policy 
directly  as  the  future  TCAS  logic,  but  instead  use  the  optimal  policy  as  a  tool  to  aid  engineers 
designing  the  new  logic.  The  optimal  policy  can  help  justify  parameter  settings,  such  as  DMOD 
and  ALIM.  Human-engineered  logic  and  the  optimal  logic  can  be  compared  in  simulation.  If  the 
human-engineered  logic  performs  significantly  worse  in  some  situations  compared  to  the  optimal 
logic,  then  those  situations  can  be  inspected  and  the  logic  revised  to  better  match  the  optimal  logic. 
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The  output  of  the  effort  would  be  hiinian-engineered  pseudocode  that  can  l)e  certified  like  {)revious 
versions  of  TCAS  and  then  implemented  by  manufacturers. 

One  advantage  of  using  human-designed  pseudocode  to  represent  the  logic  is  that  there  is 
already  experience  and  familiarity  with  certifying  pseudocode  with  prior  versions  of  TCAS.  Al¬ 
though  the  pseudocode  of  the  (Uirrent  version  of  TCAS  is  remarkably  complex,  an  engineer  could 
read  through  the  pseudocode  and  gain  some  understanding  of  how  the  logic  works.  r3isadvantages 
of  using  the  optimal  policy  merely  as  a  design  aid  include  sacrificing  performance  and  extending 
the  development  process. 

An  alternative  is  to  use  the  optimal  policy  directly.  For  example,  the  optimal  cost-to-go 
function  could  be  represented  as  a  table  of  values,  as  discussed  in  Section  3,  and  the  logic  would 
involve  performing  the  necessary  calculations  on  these  values  to  select  the  action  with  the  lowest 
expected  cost.  The  certification  process  would  require  certifying  the  table  and  the  limited  amount 
of  pseudocode  defining  the  required  calculations.  Instead  of  having  the  logic  specified  entirely  in 
pseudocode,  this  approach  would  shift  most  of  the  complexity  of  the  logic  into  a  large  numerical 
table.  The  certification  of  the  table  and  accompanying  pseudocode  would  be  similar  to  the  cer¬ 
tification  process  for  prior  versions  of  TCAS.  The  logic  would  need  to  be  rigorously  evaluated  on 
millions  of  encounters  in  simulation  to  ensure  the  logic  is  safe  and  meets  operational  requirements. 

There  are  some  advantages  of  a  tabular  representation  over  the  pseudocode  representation  of 
{)rior  TCAS  logics.  For  example,  the  table  can  be  easily  updated  as  the  airspace  changes  (which 
is  anticipated  over  the  next  20  years)  without  having  to  revise  implementations  of  pseudocode. 
A  tabular  representation  would  also  reduce  the  amount  of  effort  required  by  manufacturers  to 
implement  the  logic  and  validate  that  their  implementations  meet  the  logic  specification. 

There  are  several  alternatives  to  a  tabular  representation  of  the  cost-to-go  function.  One 
alternative  is  to  use  regression  to  apj)roximate  the  cost-to-go  function  using  a  parametric  function. 
This  can  be  done  using  either  an  online  or  offline  dynamic  programming  solution  method.  Another 
alternative  is  to  dispense  with  representing  the  cost-to-go  function  and  switch  to  representing  the 
policy  directly  using  a  classifier  trained  on  a  large  collection  of  samples  [Gl,62].  A  decision  tree 
would  be  one  way  to  represent  the  classifier  [112]. 

Depending  on  the  representation,  the  decision-making  process  of  the  logic  may  not  be  entirely 
transparent.  Although  a  decision  tree  can  be  easily  inspected  and  perhaps  understood,  a  table  of 
values  would  not  provide  much  insight  into  how  the  system  made  its  decision.  However,  the 
specification  of  how  the  representation  was  created  would  provide  an  understanding  of  the  rationale 
of  the  logic.  Additionally,  studying  the  policy  plots  (Section  3.4)  can  provide  some  insight  into  the 
behavior  of  the  logic.  Because  it  is  important  that  the  behavior  of  the  logic  be  understood  as  well 
as  possible  due  to  its  safety-critical  nature,  future  research  will  need  to  further  investigate  ways  of 
visualizing  and  understanding  the  behavior  of  the  system. 
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8.  CONCLUSIONS 


As  the  airspace  evolves  with  the  introduction  of  new  air  traffic  control  procedures*  and  surveil¬ 
lance  systems,  it  is  likely  that  the  TCAS  II  threat  detection  and  resolution  logic  will  require  modi¬ 
fication  to  meet  safety  and  operational  requirements.  Due  to  the  complexity  of  the  logic,  modifying 
the  logic  may  require  significant  engineering  effort. 

This  report  suggests  a  new  approach  to  TCAS  logic  development  where  the  engineering  effort 
is  focused  on  developing  models,  allowing  computers  to  optimize  the  logic  according  to  agreed-upon 
performance  metrics.  Because  models  of  sensor  characteristics,  pilot  response  behavior,  and  aircraft 
dynamics  can  be  (‘onstriicted  from  operational  data,  they  should  l)e  straightforward  to  justify  and 
vet  within  the  safety  community.  The  optimization  of  the  logic  according  to  these  models  would  be 
done  using  principled  techniques  that  are  well  established  in  theory  and  {)ractice  over  the  past  50 
years.  The  performance  metrics  are  based  on  cpiaiitities  that  have  long  been  used  by  the  aviation 
safety  community. 

The  objective  of  this  report  was  to  connect  this  concept  of  TCAS  logic  optimization  to  the 
existing  literature  on  model-based  optimization,  not  to  develop:)  a  p)articular  conflict  resolution  al¬ 
gorithm.  Problems  involving  sequential  decision  making  in  a  dynamic  environments  are  tyi)ically 
modeled  by  Markov  decision  processes,  where  the  state  at  the  next  decision  p:)oint  dep)ends  p)roba- 
bilistically  on  the  current  state  and  the  chosen  action.  Assuming  some  objective  measure  of  cost, 
the  best  action  from  the  current  state  is  the  one  that  minimizes  the  expected  future  cost,  Comp^ut- 
ing  the  optimal  action  from  all  p:)ossible  states  requires  a  p)rocess  known  as  dynamic  p>rogramniing, 
which  has  been  used  for  a  wide  variety  of  p:)roblems  in  comp)uter  science. 

To  illustrate  some  of  the  key  concepts  of  how  dynamic  pn*ogramming  might  be  ap>p)liecl  to 
TCAS  logic  optimization,  this  report  used  a  simple  encounter  model  and  evaluated  it  in  simulation. 
Because  the  model  is  defined  by  continuous  variables,  it  is  necessary  to  use  some  ap)p)roximation 
methods.  This  rep>ort  explored  interpolation-based  methods  and  evaluated  the  resulting  logic  using 
various  pjerforrnance  metrics. 

This  report  identified  some  of  the  issues  with  ap)j)lying  a  dynamic  programming  ap)p)roacli. 
One  issue  is  the  scalability  of  existing  solution  methods  to  higher  dinumsions.  Adding  additional 
dimensions  to  the  grid-based  representation  used  in  this  report  results  in  an  exp)onential  increase 
in  memory  and  comp)utational  requirements.  As  discussed  in  this  rep)ort,  there  are  several  methods 
worth  exploring  that  may  address  these  issues. 

One  ap)proach  that  may  scale  well  to  higher  diiiiensions  involves  using  conflict  p)robability 
estimates  to  decide  when  to  issue  resolution  advisories,  as  has  been  suggested  by  others.  Although 
this  approach  will  not  result  in  the  op^tinial  solution,  it  may  approximate  the  op)tinial  logic  well. 
One  of  the  challenges  of  this  ap3p)roach  is  estimating  the  p)robability  of  a  rare  event,  but  this  report 
discussed  a  variety  of  techniques  for  estimating  probability  of  conflict  that  are  more  accurate  than 
direct  Monte  Carlo  samp:)ling. 

Besides  methods  based  on  dynamic  p:)rogramming  and  conflict  p^robability  estimates,  several 
other  approaches  have  been  suggested  in  the  literature.  This  report  discussed  some  of  these  meth- 
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ods,  highlighting  their  strengths  and  weaknesses.  One  of  the  p^rimary  strengths  of  the  dynamic 
programming  ap)proach  over  the  other  methods  is  that  it  directly  leverages  models  of  sensor  er¬ 
ror  and  aircraft  behavior  to  find  the  optimal  logic.  Future  simulation  studies  can  determine  how 
well  these  alternative  methods  perform  compared  to  the  optimal  policy  found  through  dynamic 
programming. 

Experiments  using  the  simple  encounter  model  indicate  that  dynamic  p)rogramming  is  a 
promising  app^roach.  However,  further  work  will  be  required  to  extend  the  ap)p)roach  to  three 
spatial  dimensions.  This  report  outlined  methods  for  handling  three  spatial  dimensions  while  keep¬ 
ing  the  memory  and  computational  requirements  tractable.  Although  computational  requirements 
could  limit  the  success  of  applying  dynamic  p)rogramining  to  TCAS  logic  develop)inent,  this  report 
has  suggested  ways  to  help)  manage  the  requirements  for  memory  and  computation. 

This  work  has  focused  p)rimarily  on  the  computational  asi)ect  of  op)timizing  collision  avoid¬ 
ance  logic,  but  there  are  other  issues,  such  as  certification,  that  require  further  study.  If  this  new 
approach  to  develop)ing  logic  is  simply  used  as  an  aid  to  engineers  who  are  developing  or  revising 
collision  avoidance  pseudocode,  then  the  use  of  these  methods  would  have  little  impact  on  the  certi¬ 
fication  process.  How^ever,  if  the  logic  produced  by  dynamic  programming  or  some  other  automated 
process  is  to  be  used  directly  in  a  future  version  of  TCAS,  then  the  certification  process  may  be 
somewhat  different.  The  core  of  the  certification  process  will  be  the  same,  involving  rigorous  simu¬ 
lation  studies  and  flight  tests  to  prove  safety  and  demonstrate  operational  acceptal)ility.  However, 
the  vetting  of  the  logic  itself  will  involve  more  than  just  studying  the  logic  that  will  be  deployed  on 
the  system.  Depending  on  the  representation  of  the  logic,  it  may  not  be  directly  conip)rehensible 
by  an  engineer.  Therefore,  confidence  would  have  to  be  established  in  the  safety  community  that 
the  methods  used  to  generate  the  logic  are  sound.  This  rep)ort  represents  a  first  step  in  Justifying 
an  automated  approach  for  generating  optimized  TCAS  logic. 
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APPENDIX  A 

HYPOTHETICAL  COLLISION  AVOIDANCE  DYNAMICS 


Section  1.3  introduced  a  hypothetical  collision  avoidance  problem  that  was  used  as  an  ex¬ 
ample  to  demonstrate  the  logic  discussed  in  this  report.  This  appendix  provides  a  mathematical 
specification  of  the  underlying  model.  The  model  uses  a  state  representation  to  encode  all  the 
properties  of  the  system  that  are  of  interest  at  any  given  time.  The  state  at  time  Ms  a  vector 


x(0  = 


h{t) 

r(0 

hi{t)  , 

k{t) 

sra(0 


(A-1) 


where  h  is  the  altitude  of  the  intruder  relative  to  own,  r  is  the  time  to  closest  approach,  hi  is 
own  vertical  rate,  and  I12  is  the  intruder  vertical  rate.  The  magnitudes  of  hi  and  h2  are  limited  to 
L  =  2r)0()ft/min.  The  sra  variable  keeps  track  of  the  RA  that  has  been  issued  so  that  the  prop)er 
0.25-g  acceleration  can  be  applied  after  a  five-second  delay.  As  Figure  A-1  illustrates,  11  discrete 
states  are  required. 


The  system  dynamics  are  governed  by  the  following  hybrid,  discrete- time  Markov  model: 


x(^  +  At)  =  f(x(0,  w(0,  a{t)). 


(A-2) 


where 


w(0  = 


Wi{t) 

W2{f) 


(A-3) 


is  a  random  variable  representing  the  noise  in  the  vertical  rates  of  the  aircraft,  and  a{t)  is  the  action 
performed  at  time  t.  The  action  can  take  on  three  possible  values:  no  alert,  issue  descend,  and  issue 
climb.  Once  an  action  other  than  no  alert  is  taken,  subsequent  values  of  a  have  no  effect  upon 
the  evolution  of  the  system.  Hence,  the  model  allows  for  at  most  one  resolution  advisory  whidi,  if 
issued,  remains  in  effect  for  the  remainder  of  the  scenario. 


no  alert 


[climb  in  4  s] - >{climb  in  ~3s) - *{climb  in  2  s) - »(climb  in  1  7) - ►{climbl 


H 


[clear  of  conflict) 


JjL 


(descend  in  4  s] - ^{descend  in  3  s] - *{descend  in  2  s] - >{ descend  in  Is] - »{ descend) 

Figure  A-1.  The  variable  Sfi/i  tracks  which  RA  has  been  issued  and  the  delay. 
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The  noise  is  a  zero-mean  white  Gaussian  process  with  time-invariant  covariance  R,  that  is, 
w{t)  A/^(0,  R).  It  is  assumed  that  the  noise  in  the  vertical  rates  of  the  own  aircraft  and  intruder, 
Wi{t)  and  W2{t)^  respectively,  are  uncorrelated  and  that  they  both  have  a  standard  deviation  of 
1  ft/s^.  The  process  noise  enables  the  model  to  capture  the  stochastic  nature  of  aircraft  behavior. 


The  equations  of  motion  can  be  written  as 

h{f  +  At)  =  h{t)  +  {h2{t)  -  iH{t))At  +  t(/i2(0  -  hiit))Ap,  (A-4) 

T{t  +  At)  =  T{t)  -  At,  (A-5) 

hi{t  +  At)  =  +  h{t)At)  ,  (A-G) 

h2{t  +  At)  =  (t>L[h2{t)  +  'h2{t)AtJ  ,  (A-7) 


where  the  saturation  function  (f>L{y)  =  max(— L,  min(L,  y))  ensures  that  the  magnitude  of  either 
vertical  rate  never  exceeds  the  model  parameter  L.  The  time  step  At  is  set  to  one  second.  The 
acceleration  noise  values  wi  and  W2  are  sampled  from  A/^(0, /?)  every  second.  The  applied  vertical 
accelerations  are  given  by 


in{t) 

h2{t) 


{—0.25  g  if  sra(0  —  descend  and  hi{t)  >  —1500  ft /min, 
+0.25  g  if  sra(0  =  climb  and  iii{t)  <  +1500  ft/min. 
Wi{t)  otherwise, 

W2{t.). 


(A-8) 

(A-9) 
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APPENDIX  B 

ANALYTIC  APPROXIMATION 


Section  4.1  discusses  an  analytic  approximation  to  the  probability  of  conflict  for  the  model  of 
Appendix  A.  This  appendix  is  an  overview  of  the  linear-Gaussian  approximation  to  the  dynamics 
that  makes  this  analytic  approximation  possible. 

The  system  dynamics  are  governed  by  the  hybrid,  discrete-time  Markov  model  of  Equation  A- 
2,  repeated  here  for  convenience: 


x(/  +  At)  =  f(x(0,w(/.),a(/.)). 


(IM) 


where 

(B-2) 

is  a  random  variable  representing  the  noise  in  the  vertical  rates  of  the  two  aircraft,  and  a{t)  is  the 
action  performed  at  time  /.  The  dynamics  can  be  approximated  by  the  Gaussian  system 

x(/  -j-  At)  =  A{At)x{t)  -h  G{At,x{t))w{t)  +  B{At)u{f,  x(t)),  (L-3) 


w(0  = 


Wi{t) 


where 


u(i,x(0) 


-At 


(B-4) 


is  the  control  vector  ix'presenting  the  control  in  the  own  vertical  rate,  /q, and  tlie  rate  at  which 
the  time  to  closest  approach  r  decreases  each  time  step,  —At.  When  ttie  pilot  is  responding  to  an 
RA  (after  a  five-second  delay),  is  a  0.25-g  acceleration  in  the  RA  direction;  otherwise, 

is  zero.  Hence,  is  a  deterministic  function  of  x{t). 


Outside  of  an  RA  response,  the  system  dynamics  (excluding  sra)  can  be  written 


T  At) 

■  1 

0 

-At 

At  ' 

-  hit)  - 

- 

iAt^  - 

T{t  +  At) 

0 

1 

0 

0 

Tit) 

+ 

0 

0 

hi{t  +  At) 

0 

0 

1 

0 

init) 

At 

0 

.  /l2{f  +  At)  _ 

.  0 

0 

0 

1 

-  ii2it)  . 

0 

At 

-h 


-^At'^  0 
0  1 
At  0 
0  0 


-At 


ii’iit) 


During  the  RA  response,  there  is  no  contribution  from  the  noise  Therefore, 


G{Atx{t)) 


■  0  ■ 
0  0 
0  0 
0  At 


(B-5) 


(B-6) 
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during  RA  response.  Regardless  of  whether  the  pilot  is  responding  to  an  RA,  the  mean  of  x{t  +  At) 
is 

E[x(t  +  At)]  =  A(At)E[x(t)]  +  B{At)E[u{t,  x(t))],  (B-T) 

and  the  covariance  of  x(t  +  At)  is 

cov[x(t  +  At)]  =  A(At)cov[x(t)]A(At)^  +  G(At,  x(t))/?G(At,x(t))^.  (1^-8) 

Both  of  these  recursions  depend  upon  the  value  of  the  random  variable  x(t).  To  obtain  analytic 
approximations  fi  and  P,  the  approximate  mean  and  covariance  of  x(t),  the  approximate  mean  /i(t) 
is  substituted  for  the  true  random  variable  x(t),  yielding  recursions 

fi{t  +  At)  =  A{At)fi{t)  +  P(At)u(t, /i(t)),  (1^-9) 

P{t  +  At)  =  A{At)P{t)A{At)'^  +  G{At,  fi{t))Ra{At,  (B-10) 

Hence,  the  system  described  by  Equations  B-9  and  B-10  is  a  linear-Gaussian  system  that  switches 
between  two  modes,  no-RA  execution  mode  and  RA  execution  mode.  The  only  difference  between 
the  two  is  a  change  in  the  matrix  G. 

This  analytic  solution  is  only  an  approximation.  The  actual  dynamics  are  not  linear-Gaussian 
for  two  reasons.  First,  the  vertical  rate  saturates  at  ±2500ft/min,  causing  the  random  accelerations 
to  affect  the  state  in  a  generally  nonlinear  fashion.  Second,  accelerations  in  response  to  RAs  cause 
the  own  aircraft  to  transition  deterministically,  but  this  occurs  only  when  the  vertical  rate  is 
outside  the  target  range  of  the  RA,  a  nonlinear  dependence  on  the  current  state.  Representing 
the  distribution  of  the  state  as  a  multivariate  normal  fails  to  capture  exactly  how  the  state  evolves 
differently  in  different  regions  of  the  state  space. 

The  method  described  here  is  only  one  of  several  approaches  for  approximately  propagating 
Gaussian  distributions  through  nonlinear  system  dynamics.  In  particular,  the  sigma-point  sam¬ 
pling  techni<]ue,  described  in  Appendix  D,  offers  an  alternative  means  of  transforming  means  and 
covariances  that  may  provide  better  results  [113]. 
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APPENDIX  C 

INTERPOLATION  METHODS 


Interpolation  can  be  viewed  as  approximating  an  unknown  function  /  given  only  values  of 
this  function  at  a  finite  set  of  points  xi, . . .  This  is  of  particular  interest  because  fitted  value 
iteration  (Section  3.1)  requires  interpolating  between  estimates  of  the  cost-to-go  function  at  a  set 
of  discrete  states.  There  are  many  different  interpolation  schemes  [114],  but  this  appendix  focuses 
on  the  class  of  interpolation  functions  of  the  form 

5(x)  =  i|^/^fc(x)/(xfc),  (C-1) 

where  is  a  weighting  function,  such  that  Ylk=i  Pk{^)  =  1-  Generally,  /4(x)  should  not  decrease  as 
the  distance  between  X}^.  and  x  increases.  There  are  several  ways  to  define  the  weighting  function 
/3,  as  discussed  in  this  appendix.  Figure  C-2  compares  the  various  interpolation  methods  on  a 
two-dimensional  problem. 

C.l  NEAREST-NEIGHBOR  INTERPOLATION 

The  simplest  approach  is  to  assign  all  weight  to  the  closest  discrete  state,  resulting  in  a  piece- 
wise  constant  function  g.  Another  approach,  which  can  result  in  a  smoother  (/,  involves  finding 
the  A:-nearest  discrete  states  to  x  and  assigning  weight  \/k  to  each  and  zero  to  all  other  states. 
Figure  C-2(a)  is  an  illustration  of  nearest-neighbor  interpolation  in  two  dimensions.  Each  black 
cross  represents  a  data  point,  and  each  color  represents  a  different  function  value.  Observe  that 
the  estimates  obtained  by  nearest-neighbor  interpolation  are  particularly  coarse,  the  interpolating 
function  being  piecewise  defined. 

C.2  MULTILINEAR  INTERPOLATION 

An  alternative  aj)proach  is  to  use  multilinear  interpolation  over  the  box  (also  called  a  hyper¬ 
rectangle  or  orthotope)  in  the  grid  that  encloses  the  point  x  as  shown  by  the  notional  diagram  in 
Figure  C-1.  The  weights  of  the  states  at  the  vertices  of  the  box  are  related  to  how  close  they  are 
to  X.  The  formula  for  calculating  the  weights  in  one  dimension  is 

gix)  =  (1  -  ^^)/(:ri)  +  (1  -  (C-2) 

X2  —  a:i  X2  —  .Ti 

^  S/-  ^  'V 

01  (x)  p2ix) 

where  xi  is  the  vertex  to  the  left  of  x  and  X2  is  the  vertex  to  the  right  of  x.  Multilinear  in¬ 
terpolation  is  a  multidimensional  generalization  of  the  linear  interpolation  of  Equation  C-2.  The 
two-dimensional  example  in  Figure  C-1  results  in  the  following  estimate  for  ^(x): 

(/(x)  =  (l/8)/(xi)  +  (1/8)/(x2)  +  (3/8)/(x3)  +  (3/8)/(x4).  (C-3) 
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L  / 

k  As 

u  % 

A  r 

)/2  1/2 

k _ / 

X4 

>  <> 

V  s 

O  ■  - C 

\  ^  T 

A(x)  =  (1  -3/4)(l  -  1/2)  =  1/8 

=  (1  -  3/4)(l  -  1/2)  =  1/8 

A(x)  =  (1  -  1/4)(1  -  1/2)  =  3/8 

/J,(x)  =  (1  -  1/4)(1  -  1/2)  =  3/8 

6  6-  A 

Figure  C-1.  Multilinear  interpolation. 


Figure  C-2(b)  is  an  illustration  of  nuiltilinear  interpolation  in  two  dimensions.  Although  the  func¬ 
tion  is  not  technically  smooth,  there  is  a  greater  amount  of  gradation  than  nearest-neighbor  inter¬ 
polation. 


C.3  SIMPLEX-BASED  INTERPOLATION 

One  potential  issue  with  multilinear  interpolation  is  that  the  number  of  vertices  used  for  inter¬ 
polation  grows  exponentially  with  the  dimensionality  of  the  state  space.  If  d  is  the  number  of 
dimensions,  multilinear  interpolation  can  use  up  to  2^  vertices  to  estimate  the  cost-to-go  function 
for  a  single  point.  As  suggested  by  Davies  [1 15],  an  alternative  is  to  use  simplex-based  interpolat  ion. 
In  the  simplex  method,  the  boxes  are  broken  into  d\  multidimensional  triangles,  called  siinplexes, 
according  to  the  Coxeter-Freudenthal-Kuhn  triangulation  [116].  Instead  of  interpolating  over  a 
d-dimensional  box  with  up  to  2^^  vertices,  the  simplex-ba^sed  method  interpolates  over  a  simplex 
defined  by  up  to  d  +  1  vertices.  Hence,  the  simplex  method  scales  linearly  instead  of  exponentially 
with  the  dimensionality  of  the  state  space.  However,  multilinear  interpolation  can  provide  higher 
quality  estimates  that  can  lead  to  better  policies  for  the  same  grid  resolution. 

Figure  C-2(c)  is  an  illustration  of  simplex  interpolation  in  two  dimensions.  Simplex  interpo¬ 
lation  provides  higher  quality  estimates  than  nearest-neighbor  interpolation  while  using  less  data 
points  than  multilinear  interpolation. 
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CA  LOCAL  LAGRANGE  INTERPOLATION 


Lagrange  interpolation  is  often  used  to  interpolate  functions  of  one-dimensional  variables.  Namely, 
if  /(x/j)),  /c  =  1,  2, . . . ,  n  are  the  training  data,  the  Lagrange  base  polynomials  are  defined  as 


n 


X  -  Xi 


(C-4) 


from  which  the  Lagrange  polynomial  that  interpolates  the  data  can  be  constructed  as 


=  Y.^^k{x)f{xk). 

k-l 


(C-5) 


This  is  a  one-dimensional  interi)olation  scheme  that  is  often  applied  glol)ally,  i.e.,  all  data  points 
are  used  to  construct  the  polynomial  C-5.  This  is  of  limited  value  for  interpolation  of  the  cost-to-go 
function  of  Section  3  because  the  state  space  is  very  large.  Luo  [117]  prof)Osed  a  local  multivariate 
Lagrange  interpolation  scheme.  Let  P  =  (x^,  /(x^)),  k  =  1, 2, . . . ,  n  represent  the  training  data  set, 
and  let  p  =  (x^, /(x^)),  A:  =  1,2,...,  A  represent  a  subset  of  these  data,  where  p  €  P,  N  <  n. 
Then  the  Lagrange  base  polynomials  are 


N 


^•(x)  =  n 


(x  -  Xj)-'  (x/i-  -  Xj) 


The  Lagrange  interpolating  polynomial  is 


N 


=  y]<?!>fc(x)/(x/t), 


k=l 


where  the  normalized  Lagrange  base  polynomials  are 


4>k{^)  = 


(C-G) 


(C-7) 


(C-8) 


Figure  C-2(d)  is  an  illustration  of  local  Lagrange  interpolation  in  two  dimensions.  The  func¬ 
tion  value  at  each  point  was  interpolated  using  the  data  at  the  corners  of  the  smallest  square 
in  which  the  point  is  contained,  thus  requiring  the  same  number  of  data  points  as  multilinear 
interpolation.  However,  the  performance  is  noticeably  different. 
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Figure  C-2.  Comparision  of  interj)olation  methods  in  two  dimensions.  The  data  points  are  indicated  with 
black  crosses. 
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APPENDIX  D 
SAMPLING  METHODS 


If  X  is  a  (possibly  imiltidimensional)  random  variable  with  density  p  and  /  is  a  function  of 
X,  then  the  expected  value  of  the  random  variable  /(X)  is 


Kp[/(X)]  =  j  p(x)/(x)dx 


(D-1) 


Calculating  the  expected  value  for  a  function  of  a  random  variable  arises  in  several  contexts: 

•  Applying  the  Bellman  operator:  The  random  variable  X  represents  the  state  at  the  next 
decision  stage,  and  /  is  the  cost-to-go  function.  (Section  3.1) 

•  Estimating  the  probability  of  conflict  from  the  current  state:  The  random  variable 
X  represents  the  future  trajectories  of  the  aircraft,  and  /  indicates  whether  an  encounter 
occurs.  (Section  4.3) 

•  Evaluating  the  performance  of  a  system:  The  random  variable  X  represents  an  en¬ 
counter,  and  /  is  the  performance  metric  used  for  evaluation  (e.g.,  conflict  probability). 
(Section  5.4) 

In  general,  it  is  not  [lossible  to  evaluate  the  integral  in  Equation  D-1  analytically.  This  section 

outlines  sampling  methods  for  estimating  Ep[/(X)]. 

D.l  DIRECT  MONTE  CARLO 

The  direct  Monte  Carlo  estimate  of  Ep[/(X)]  is  given  by 


(D-2) 


where  . . . ,  are  sampled  directly  from  p.  Although  this  estimate  is  unbiased,  direct  Monte 


Carlo  may  require  a  large  number  of  samples  to  provide  adequate  accuracy  [118]. 

D.2  IMPORTANCE  SAMPLING 

Importance  sampling  is  a  technique  for  improving  the  accuracy  of  estimates  [73,119].  In  importance 
sampling,  one  draws  samples  from  a  new  distribution  q,  called  the  proposal  distribution,  that  favors 
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important  samples,  i.e.,  those  that  contribute  most  to  p(x)/(x).  Equation  D-1  can  be  written  as 


Ep[/(X)]  =  J  p{x)f{^)d:x. 


=  /,(x) 


=  E„ 


P(x) 
9(x) 

P(X) 


/(x)dx 


7(X) 


/(X) 


(D-3) 


where  p{^)/q{x)  is  called  the  likelihood  ratio.  The  unbiased  importance  sampling  estimator  of  /  is 


1 

N 


k=l 


(7(x(^'))  ’ 


(D-4) 


where  x^^\ . . .  ,x(^^  are  samples  from  q.  In  general,  the  proposal  distribution  must  satisfy  q(X.)  = 
0  /(X)p(X)  =  0  to  be  admissible.  The  optimal  proposal  distribution  q*  is  proportional  to 

|/(x)1p(x)  [120]. 


D.3  SIGMA-POINT  SAMPLING 

Sigma-point  sampling  involves  generating  deterministically-chosen  samples  that  capture  statistics 
of  the  distribution  [113].  If  X  is  iii  dimensional,  there  are  2m +  1  sample  points  x^^’^  with  associated 
weights  defined  by 

x(^')  =  /i  A:  =  0, 

^  _  m-t-K  ’ 

=  p  +  (v/(7n  +  «)S)p  tcW  =  2(m^  k  =  I,...,  m,  (D-5) 

x(^)  =  /X  -  (\/(7n  +  K)j:)k-m  wW  ^  2{m+K)  k  =  m  +  I, . . .  ,2ni, 

where  k  is  a  scaling  parameter  and  {\/{m  +  7v)S)fc  is  the  A;th  row  of  the  matrix  square  root  of 

(rn  +  k)E.  The  mean  and  covariance  of  X  are  given  by  /x  and  S,  respectively.  These  sample  points 

are  called  sigma  points.  These  weighted  sigma  points  capture  the  true  mean  and  covariance  of  X. 
The  sigma-point  estimate  is  given  by 


k=l 


(D-6) 


where  N  =  2m  +  1  [113, 121]. 
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APPENDIX  E 

PROPOSAL  DISTRIBUTIONS 


This  appendix  presents  the  importance  sampling  proposal  distributions  discussed  in  Section 
4,  An  effective  proposal  distribution  favors  sample  trajectories  that  result  in  conflict  but  that  are 
still  likely  to  occur  according  to  the  system  dynamics. 


In  the  following  discussion,  let  Pr(C  |  x,a)  represent  the  probability  that  a  conflict  will  occur 
from  state  x  assuming  action  a  is  continuously  executed,  and  let  Pr(C'  |  x,a)  represent  an  estimate 
of  that  probability.  Let  trajectory  T  =  (x(to),  •  •  ‘ ,  x(f  a"))  be  a  random  variable  representing  a 
trajectory  that  is  produced  by  starting  at  state  x(^o)  executing  action  a  until  CPA.  Let  IT  = 

1  i))  be  a  random  variable  representing  the  noise  at  each  time  step  along  the 

trajectory.  The  noise  IT  maps  uniquely  to  a  trajectory  T  according  to  Ecpiation  A-2.  Let  p{T) 
and  p(fT)  represent  the  density  of  T  and  of  IT,  respectively.  Define  C{T)  as  the  conflict  indicator 
function: 


C(T)  = 


1 

0 


if  T  results  in  a  conflict, 
otherwise. 


(E-l) 


The  probability  of  conflict  is 


Pr(C  I  X,  a) 


Ep[C(T)]  =  E, 


C{T) 


pm' 


(E-2) 


for  which  the  unbiased  importance  sampling  estimator  (Appendix  D.2)  is 


Pr(C  I  X,  a) 


(E-3) 


wliere  . . . ,  are  independent  samples  from  q. 

The  density  p{\V)  can  be  written  explicitly  as 


p(lE)  =p(w(to))p(w(tO)  •••  ‘w(4).  (E.4) 


where  R  is  the  covariance  of  the  noise.  Moreover,  the  proposal  distribution  q{\V)  can  be  written 
as 

q(\V)  =  q{w{to)  I  x(to))  7(w(ti)  |  x(<i))  •  •  •  q{w{tK-i)  \  x{tK  i)).  (E-5) 

The  importance  sampling  estimator  for  Pr(C  |  x,a)  becomes 


Nt 


K-l 


Pr(C|x,a)  =  — ^C(tW) 


p(wW(4)) 


i-1 


k=0 


<7(wW(tfc)  I  x(d(t^.))' 


(E-6) 


wliere  is  the  noise  at  time  tk  for  sample  trajectory  i  and  is  the  state  at  time  tk  for 

sample  trajectory  i.  An  effective  proposal  distribution  |  x^*^(<fc))  from  which  to  sample 

the  noise  at  each  time  step  is  one  in  which  the  sample  trajectory  resulting  from  the  noise  will  result 
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ill  a  conflict  with  high  probability.  This  process  is  known  as  sequential,  or  dynamic,  impiortance 
sampling.  The  proposal  distribution  is  state-dependent  because  the  distribution  from  which  to 
sample  the  noise  to  produce  conflicts  is  a  function  of  how  far  away  the  state  is  from  conflict  at  each 
time. 


The  following  sections  describe  several  proposal  distributions.  In  all  of  the  descriptions, 
let  x(t)  represent  the  state  at  time  t  in  a  sample  trajectory  for  which  a  proposal  distribution 
(/(w(t)  1  x(f))  is  desired. 

E.l  CONSTANT  ACCELERATION  PROPOSAL 

If  x{t)  is  not  a  state  corresponding  to  an  RA  execution,  both  ownship  and  intruder  accelerations 
can  be  apiplied  to  artificially  force  the  next  state  closer  to  a  conflict.  This  process  continues  until 
CPA.  The  resulting  trajectory  should  result  in  a  conflict  with  high  probability.  If,  on  the  other 
hand,  x(f)  is  a  state  corresponding  to  an  R  A  execution,  the  ownship  acceleration  is  dictated  by  the 
issued  RA,  and  therefore  only  the  intruder  acceleration  can  be  changed  to  cause  a  conflict. 

A  conflict  can  be  induced  by  applying  acceleration  at  each  time  step  that  reduces  the  projected 
vertical  separation  at  CPA  to  zero.  A  more  sophisticated  way  of  inducing  conflict  is  to  force  the 
trajectory  to  reach  the  vertical  separation  most  likely  to  occur  while  executing  the  action  but  that 
still  qualifies  as  a  conflict.  More  expjlicitly,  let  hpro]  represent  the  projected  vertical  separation 
at  CPA  achieved  by  the  noiseless  trajectory  starting  from  x{t)  and  executing  action  a.  Then  tlie 
targeted  vertical  separation  /^target  should  be  100  ft  if  /iproj  >  100  ft,  —100  ft  if  /iproj  <  —100  ft,  and 
/iproj  otherwise.  That  is, 

(  +100  ft  if /iproj  >  100  ft, 

/^target  =  I  -100  ft  if  /iproj  <  ”100  ft,  (E-7) 

[  /iproj  otherwise. 


Forcing  the  trajectory  to  reach  the  closest  point  in  the  conflict  region  prevents  the  likelihood  ratios 
from  becoming  too  small.  Small  likelihood  ratios  indicate  that  the  proposal  distribution  q  favors 
samples  in  low-probability  regions  of  p.  This  generally  is  not  preferable  because,  as  the  form  of  tlie 
optimal  proposal  distribution  suggests,  an  effective  proposal  distribution  should  be  as  similar  to  p 
as  possible  while  still  favoring  trajectories  that  result  in  conflict. 

Consider  the  case  when  no  RA  is  currently  being  executed.  If  constant  accelerations  /q  and 
/?2  are  applied  to  the  own  and  intruder  aircraft,  respectively,  in  order  to  achieve  a  relative  altitude 
of  /itarget  CPA,  the  accelerations  must  satisfy: 


-(/?2  +  ^+roj  —  ^target 


/l2  -  /+ 


2  (  /^  target  h  proj ) 

t-2 


(E-8) 

(E-9) 


FApiation  E-9  defines  a  set  of  infinitely-many  combinations  of  /q  and  /q  that  could  be  applied  to 
reach  /itarget-  However,  one  would  like  to  keep  the  accelerations  close  to  the  mean  of  the  distribution 


86 


Figure  E-L  Comparison  of  distributions  p  and  q.  Shown  ar^e  error  ellipses  for  the  two  distributions.  The 
distribution  p  is  centered  at  the  origin,  while  q  is  centered  at  the  point  on  the  line  [Equation  E-9)  closest  to 
the  origin. 


p(w(i))  to  prevent  the  likelihood  ratios  from  becoming  too  small.  Because  the  mean  of  p(w(/))  is 
zero,  the  task  of  finding  the  accelerations  closest  to  the  mean  reduces  to  that  of  finding  the  point 
on  the  line  h2  —  h\  +  2(/?.target  “  /^proj)/'?"^  closest  to  the  origin.  It  is  straightforward  to  show  that 
the  solution  is 


I  ^hno]  ^harget 

fix  = - - - 

112  =  -hi. 

Sami)ling  from  a  proposal  distribution  q[y^[t)  \  x(^))  that  is  a  Gaussian  with  mean  as  given  by 
E(iuations  IvlO  and  E)-ll  and  covariance  identical  to  p(w(/.)),  namely  R,  should  result  in  a  lower 
variance  estimate  of  the  probability  of  conflict.  Figure  E-1  compares  the  distribution  p  with  the 
proposal  distribnton  q.  The  form  of  the  proposal  distribution  need  not  l)e  Gaussian;  any  distribution 
that  favors  samples  close  to  Ecpiations  E-IO  and  E-11  will  work  suitably.  Moreover,  it  is  crucial 
to  note  that  although  the  mean  of  the  proposal  distribution  are  accelerations  which,  if  applied 
constantly  until  CPA,  would  lead  to  a  conflict,  the  accelerations  are  only  applied  for  one  time  step, 
after  which  the  next  state  is  sampled  and  the  calculations  repeated. 

Now  consider  the  case  when  an  RA  is  being  executed.  Only  the  own  acceleration  can  be 
controlled  to  move  closer  to  a  conflict.  Analogous  to  Equations  E-8  and  E-9,  the  requisite  intruder 
acceleration  is 


(E-IO) 

(E-11) 


—  112'^  T  /^proj  “  ^darget 

*.*  2(/qarget  ~  ^^proj) 

112  =  - - n - 

Intuitively,  this  is  exactly  twice  that  of  the  intruder  acceleration  when  no  RA  is  being  executed. 
If  the  accelerations  remain  uncorrelated  in  the  proposal  distribution,  the  proposal  distribution  can 
be  written  as  the  product  of  the  marginals 

q{wi{t)  I  x{t))q{w2{t)  I  x{t)).  (E-14) 


(E-12) 
(E-1 3) 
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Since  no  change  is  made  to  the  own  acceleration,  it  is  convenient  to  set  q{wi{t)  \  x(^))  =  p{wi{t)) 
to  simplify  the  likelihood  ratio  calculation.  The  marginal  distribution  q{v^2{i)  \  ^(0)  ^  Gaussian 

with  mean  given  by  Equation  E-13  and  variance  R22* 


E.2  MAXIMUM-LIKELIHOOD  ACCELERATION  PROPOSAL 


An  alternative  proposal  distribution  can  be  obtained  as  follows.  Again,  the  RA  execution  and  no- 
RA  execution  cases  must  be  considered  separately.  The  latter  case  is  considered  first.  Previously, 
the  requisite  constant  accelerations  needed  to  arrive  in  the  conflict  region  from  an  arbitrary  x(^) 
were  derived.  Now  let  {hi{to)^  h2{to), . . .  ^  hi{tK~i)/h2{tK-i))  denote  an  acceleration  sequence  of 
the  aircraft  starting  from  x(f).  That  is,  /ii(^o)  ^i^d  /i2(^o)  ^re  applied  for  the  first  time  step,  hi{ti) 
and  h2{ti)  are  applied  for  the  second  time  step,  and  so  on  until  CPA.  If  it  is  desired  that  the 
aircraft  achieve  a  relative  altitude  of  /itarget  after  r  seconds  starting  from  state  x(^),  the  control 
history  must  satisfy 

^harget  ~  ^^proj  T  (^^2(^0)  “  —  1)  +  ••  * 

+  (/i2(^A'-2)  -  'hi{tK-2))  +  2(^^2(^o)  -  H -  (E-15) 

+  -  hi{tK-i)), 

where  =  1  s  was  used.  The  olqective  is  to  find  the  accelerations  that  both  satisfy  Equation 
E-15  and  that  produce  a  trajectory  that  is  most  likely  to  occur  by  executing  action  a.  This 
is  equivalent  to  minimizing  the  diflerence  between  the  trajectory  produced  by  the  accelerations 
li2{to)^  ^  ^  h2{tK-i))  and  the  noiseless  trajectory. 


Begin  by  defining 

^i(^o) 

/^2(^o) 

U  =  h2{ti) 


_  /l2(^A"-l)  . 


_  1 
2 

1 

L  2 

and  Ah  =  /^target  “  /^proj-  Then  Equation  E-15  can  be  expressed  as 

a^u  =  Ah. 


(E-16) 


(E-17) 


(E-18) 
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Since  the  accelerations  are  nominally  zero-mean  Gaussian,  the  most  likely  trajectory  that  achieves 
the  desired  vertical  se{)aration  is  the  one  that  minimizes  the  square  norm  of  the  acceleration  vector 
it  subject  to  the  constraint  in  Equation  E-18.  That  is,  the  most  likely  accelerations  are 

It*  =  argmin  -||u||^.  (E-19) 

This  is  a  quadratic  programming  problem  that  can  be  easily  solved  using  the  method  of  Lagrange 
multipliers  [122].  If  f{it)  =  ^||u||^  is  the  objective  to  be  minimized  and  g{it)  a^ii  —  Ah  is  the 
linear  constraint,  then  one  must  find  it  that  satisfies 


Vu/(u)  -  \Va9{u)  =  0, 


(E-2()) 


where  A  is  the  Lagrange  multiplier.  This  equation  easily  simplifies  to  u  —  Aa  =  0  or  u  —  Aa. 
Multiplying  both  sides  by  yields  a^u  =  Ah  =  A||a||^.  Thus  A  =  A/i/||a|f‘^  and 


Ah 


u  = 


a 


a 


12  • 


The  square  norm  of  a  is 


t-1/2 


\a 


t=ll2 


It  follows  that  the  first  accelerations  hi  and  h^  to  apply  to  state  x(/)  are 


(E-21) 


(E-22) 


••  (1  — 2r)A/i 

/i2  =  -hi. 


(E-23) 

(E-24) 


As  before,  the  proposal  distribution  is  a  Gaussian  with  mean  given  by  P]qiiations  E-23  and  E-24. 
It  is  straightforward  to  show  that  the  marginal  distribution  of  iV2{t)  of  the  proposal  distribution 
when  executing  an  RA  has  a  mean  given  by 


h2  - 


(2t-  l)A/i 


(E-25) 


which  is  exactly  double  that  of  Equation  E-24. 

The  optimization  performed  here  happened  to  admit  an  analytic  solution  due  to  the  simplicity 
of  the  problem.  However,  solving  Equation  E-19  typically  requires  approximation  techniques  such 
as  gradient  descent  or  a  Newton-like  method  to  minimize  the  objective  [122]. 


E.3  ANALYTIC  PROPOSAL 

Although  there  are  analytic  methods  for  computing  the  probability  of  conflict  for  some  simple 
models  [68],  analytic  solutions  do  not  always  exist.  Even  for  the  relatively  simple  encounter  model 
of  Appendix  A,  the  expression  for  the  probability  of  conflict  quickly  becomes  intractable,  as  the 
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number  of  raiicloin  accelerations  that  contribute  to  the  state  at  CPA  depends  upon  the  vertical  rate 
III  at  all  preceding  times.  However,  a  reasonably  accurate  approximation  to  the  analytic  solution 
that  is  feasible  to  evaluate  can  be  derived  as  follows. 

Ill  Appendix  B,  the  following  approximations  to  the  mean  and  covariance  updates  of  the 
aircraft  state  were  tlerived: 

fi{t  +  At)  =  A{At)jl{t)  +  B{At)u{t^  (E-26) 

P{t  +  At)  =  A{At)P{t)A{At)'^  +  G{At,  li{t))RG{At,  .  (E-27) 

The  probability  of  a  conflict,  Pr(C  |  x,a),  can  then  be  estimated  by  initializing  the  mean 
/^(^o)  =  X  and  the  covariance  P{to)  =  0  and  applying  the  recursions  in  Equations  E-26  and  E-27 
to  obtain  and  P{tK)^  the  approximate  mean  and  covariance  of  the  state  x(^/^)  at  CPA.  The 

marginal  distribution  for  li  at  CPA  is  Gaussian  with  mean  equal  to  the  first  element  in  /i(^/c)  ^ind 
variance  equal  to  the  first  element  in  To  determine  the  probability  of  conflict,  one  simply 

integrates  the  density  from  —100ft  to  -hlOOft. 

It  can  he  shown  that  the  probability  of  a  conflict  from  a  state  x  not  in  the  conflict  region 
when  executing  action  a  is  Pr(C  |  x,a)  =  Epj.(x'|x,a)[E^(C  |  x',a)].  This  may  be  written 


Epr(x'|x.a)[Pi'(C  I  x',a)]  =  j  Pr(C  I  f(x,w,a),a)p(w)dw,  (E-28) 

where  x'  =  f(x,w,a)  is  the  successor  state  from  x  when  taking  action  a  with  noise  v^.  Suppose 
Pr(C  I  x',a)  is  known  for  all  possible  successor  states  x'.  In  such  a  situation,  one  could  estimate 
the  probability  of  conflict  Pr(C  |  x)  by  producing  samples  of  the  immediate  noise  w.  It  can  be 
shown  that  the  optimal  proposal  distribution  from  which  to  draw  such  samples  is 

g*(w  I  x)  oc  Pr(C  I  f  (x,  w,  a),  a)  •  p(w).  (E-29) 

The  mean  of  this  optimal  proposal  distribution  can  be  approximated  by  drawing  N  =  2nw  +  1 
sigma-point  samples  from  p(w)  with  associated  weights  For  each  sample,  define 

pW  =  Pr(C|f(x,x('\a),a),  (E-3()) 

which  can  be  estimated  using  the  analytic  approximation  to  Pr(C)  described  above.  It  follows  that 


E„ 


-I 


N 


w  •  I  x)  dw 


Y,nM>x^'K 


i=\ 


where 


= 


w 


(i)  p{i) 


Efc=i 


(E-31) 


(E-32) 


are  the  normalized  weights.  Thus,  this  approximation  of  the  mean  of  the  optimal  proposal  distri¬ 
bution  q*  becomes  the  mean  of  a  Gaussian  proposal  distribution.  As  with  the  preceding  proposal 
distributions,  the  covariance  was  chosen  arbitrarily  to  be  the  same  as  that  of  p(w).  This  proposal 
distribution  favors  samples  that  have  both  a  high  probability  of  conflict  and  a  high  probability  of 
occurring  under  the  model.  Thus,  the  analytic  probability  of  conflict  approximation  is  used  as  a 
heuristic  to  guide  the  sample  trajectories  toward  conflict. 
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E.4  DYNAMIC  PROGRAMMING  PROPOSAL 


The  dynamic  programming  proposal  distribution  is  identical  to  that  of  the  analytic  proposal  dis¬ 
tribution  except  that  the  dynamic  programming  estimate  of  the  probability  of  conflict  (given  in 
Section  4.2)  is  used  to  determine  the  mean  of  the  distribution. 
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APPENDIX  F 

PROOF  OF  PARETO  OPTIMALITY 


Suppose  that  tlie  immediate  cost  function  is  of  the  form 

c(s,a)  =  Y^Xifiis.a), 
i 


(F-l) 


where  >  0  and  fi{s,a)  E  {0,  1}.  In  addition,  when  following  any  policy,  fi{s^a)  may  be  1  only 
once.  The  function  fi{s,a)  can  be  thought  of  as  an  indicator  of  whether  event  i  occurs  at  state 
s  when  taking  action  a.  In  the  hypothetical  collision  avoidance  problem,  there  are  two  events:  a 
conflict  occurs  for  the  first  time  and  an  alert  is  issued  for  the  first  time. 

The  cost-to-go  function  when  using  an  immediate  cost  function  of  t  he  form  sj)ecified  in  Equa¬ 
tion  F-l  satisfies: 


r{s)  -  E 


7r(st))  I  So  =  .S',  ■ 


t  i 


^/i(5,,7r(s£))  I  So  = 


TT 


L  t 


(F-2) 


where  Si  is  the  state  at  time  t  and  So  is  a  random  variable  specifying  the  initial  state.  The  function 
Pi{7T^s)  is  the  probability  of  event  i  occurring  when  starting  in  state  s  and  following  policy  tt. 

Assuming  some  distribution  b  over  starting  states,  the  probability  that  event  i  occurs  when 
following  TT  is  given  by 

s 

The  expected  cost  of  a  policy  tt  is 

J{n)^Y:h{s)r{s).  (F-4) 

It  follows  that 

i 


Value  iteration  finds  an  optimal  policy  n*  such  that  T(7r*)  <  J{n)  holds  for  all  policies  tt.  It 
can  be  shown  that  an  optimal  policy  with  respect  to  a  cost  function  A  =  {Ai,...,A„}  is  Pcireto 
optimal  with  respect  to  pi, .  . .  stated  by  the  following  proposition. 

Proposition  1,  Suppose  that  an  optimal  policy  tt*  is  known  for  a  given  cost  function.  For  any 
policy  TT  and  any  j,  if  Pj{7r)  <  Pj(7r*)  then  there  exists  some  i  such  that  Pi{n)  >  Piin*). 
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Proof,  For  a  contradiction,  assume  that  Pj{7t)  <  Pj{7T*)  and  Pi(7r)  <  Pi{7i*)  for  all  i.  It  follows  that, 
for  all  A, 


i 

<  ^jPj(7r*)  +  ^Ajpj(7r*) 

i/j 

=  ./(tt*).  (F-6) 

Hence,  ./(tt)  <  J(7r*).  If  A  is  chosen  to  be  the  cost  function  for  which  tt*  is  optimal,  then  ./(tt*) 
J(7r),  which  is  a  contradiction. 

There  are  two  important  consequences  of  Pareto  optimality. 

1.  Given  an  optimal  policy,  there  is  no  other  policy  with  the  same  alert  rate  and  a  lower  conflict 
rate. 

2.  Given  an  optimal  policy,  there  is  no  other  policy  with  the  same  conflict  rate  and  a  lower  alert 
rate. 

Hence,  the  system  operating  characteristic  curve  (Section  5.3)  traced  by  optimal  policies  with 
varying  alert  cost  will  never  be  to  the  right  or  below  that  of  any  other  policy. 


94 


VI  □ 


APPENDIX  G 

COMPUTING  THE  TIME-TO-CONFLICT  DISTRIBUTION 


Section  7.1  describes  a  method  to  efficiently  handle  dynamics  in  three  spatial  dimensions  that 
involves  estimating  a  distribution  over  r,  the  time  to  horizontal  conflict.  This  appendix  explains 
how  to  compute  the  time-to-conflict  distribution  for  an  arbitrary  Markov  process  using  dynamic 
programming.  For  simplicity,  assume  that  time  is  discretized  to  one-second  intervals  and  that  the 
time  horizon  is  /max-  Abe  probability  of  transitioning  from  s  to  s'  is  given  by  Pr(s'  |  s). 

The  algorithm  uses  Ds{t)  to  represent  the  probability  that  the  time-to-conflict  falls  within 
the  interval  [r,  r  +  1)  when  starting  at  state  s.  If  r  =  then  Ds{r)  represents  the  probability 
that  the  time-to-conflict  falls  within  the  interval  [/max^oo).  The  algorithm  uses  arrays  of  length 
^niax  to  represent  Dg  for  each  state  s.  The  arrays  are  initialized  as  follows: 

f  1  if  s  G  C  and  r  =  0, 

As(t)  ^  <  1  if  s  ^  C  and  r  =  fniax,  ((-^’1) 

[  0  otherwise, 

where  s  £  C  means  that  s  is  a  conflict  state. 

Once  Ds  is  initialized,  a  copy  is  stored  in  D' .  For  each  s  ^  (7,  is  updated  as  follows: 

r  0  if  r  =  0, 

=  S  Ey  I  s)Ds'{t  -  1)  if  0  <  r  <  fmax-  (G-2) 

[  Ey  I  {Ds'{t  -  1)  +  Ds'{t))  otherwise. 

This  update  process  assigns  to  D'  the  weighted  sum  of  the  histograms  represented  by  shifted 
to  the  right  one  second,  where  the  weights  are  determined  by  Pr(s'  |  s). 

After  Dg  is  computed  for  all  states,  it  is  copied  to  Dg.  This  process  is  repeated  ^niax  times, 
after  which  Dg  will  represent  the  true  distribution  over  r,  truncated  at  /max*  If  there  are  ti  discrete 
states  and  there  are  a  maximum  of  k  successor  states,  the  time  complexity  of  the  algorithm  is 
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APPENDIX  H 
TRACKER 


For  the  purposes  of  evaluating  the  dynamic  programming  logic  in  simulation,  a  tracker  was 
developed  that  emulates  the  behavior  of  the  tracker  implemented  in  TCAS  for  aircraft  reporting 
altitude  with  25-ft  quantization  [6].  The  output  of  the  tracker  is  not  a  probability  distribution  over 
states,  a  common  example  being  a  Gaussian  distribution  in  the  case  of  Kalman  hltering,  but  rather 
a  single  point  estimate  of  altitude,  range,  and  range  rate,  among  others.  This  appendix  relates  the 
details  of  the  tracker. 

The  internal  state  of  the  tracker  at  time  t  is 

(/^'own(0i  ^(0’  ^*^own(05  ^(0’  ^(0)*  (^^"0 

The  ineasurement  at  time  t  is 

(/w(0i/hnt(0:X(0.^(0)-  (H-2) 

Upon  receiving  the  first  measurement,  the  tracker  initializes  the  altitudes  and  intruder  range 
to  their  measured  values  and  the  intruder  range  rate  to  zero.  After  receiving  the  second  measure¬ 
ment,  the  tracker  estimates  the  intruder  range  rate  and  the  altitude  rates  using  finite  difference's, 
i.e.,  the  difference  between  the  current  measurements  and  the  previous  ones  divided  by  the  time 
step  between  measurements  (nominally  one  second).  Upon  receipt  of  the  second  measurement, 
moreover,  the  intruder  range  acceleration  is  initialized  to  zero. 

The  own  altitude  and  own  altitude  rate  are  updated  at  each  time  using  a  two-step  process. 
First,  the  own  altitude  at  time  ^  -h  AMs  predicted  using  a  constant  velocity  model: 

/^own,  prpd(^  +  Af)  —  ^^own(0  T  A^.  (U-3) 

Then  the  own  altitude  and  own  altitude  rate  at  time  ^  -h  A^  are  updated  using  the  most  recent 
measurement: 

^^own  /^own,pred(^  T  A^)  T  T  Af)  /h)wn,pred(^  T  A^)),  (II-4) 

+  A^)  =  Aown(0  T  ^(/^owii(^  +  A^)  —  /?own,pred(^  +  A^)),  (11-5) 

where  a  and  control  the  level  of  correction  in  the  own  altitude  and  own  altitude  rate,  respectively, 

due  to  the  measurement.  This  is  known  as  an  a-jS  tracker.  The  intruder  altitude  and  altitude  rate 

are  updated  similarly.  If  the  intruder  altitude  has  not  changed  for  over  6.5  seconds,  the  intruder 
altitude  is  reset  to  its  measured  altitude  and  the  intruder  altitude  rate  to  zero.  The  a  and  0 
values  for  the  own  aircraft  updating  are  fixed  at  0.6  and  0.257,  respectively.  However,  the  a  and 
values  for  the  intruder  aircraft  updating  are  dependent  on  the  size  of  the  prediction  error  and  the 
magnitude  of  the  intruder  altitude  rate. 

The  update  for  the  range  and  range  rate  uses  an  a- (3-'^  tracker.  The  predicted  range  and 
range  rate  are 

rpred(^  +  AO  =  r(0  +  r(0A^,  (H-6) 

Ved(^  +  AO  =  ?(0  +  ?(0AU  (H-7) 
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The  updated  range  and  range  rate  at  time  ^  are  then 


(H-8) 

(H-9) 


r{t  +  Ai)  =  rpred(<  +  Af)  +  a(r(<  +  Af)  -  rpred(i  +  Af)), 
r{t  +  Ai)  =  rpred(^  “1“  Af)  +  Ai)  —  rprg(j(^  +  Ai)). 

When  a  sufficient  amount  of  time  has  passed  since  the  own  aircraft  first  started  a  track  on  the 
intruder,  i.e.,  if  the  intruder  has  sufficient  “firmness,”  then  tlie  range  acceleration  is  updated  as 

r(f  +  Af)  =  r{t)  +  ~  ^pred(^  +  ^t))-  (H-10) 

The  n,  /3,  and  7  coefficients  for  the  range  updating  generally  decrease  as  the  firmness  increases, 
indicating  increased  confidence  in  the  prediction. 

All  values  for  o,  (3,  and  7  come  from  the  minimum  operational  performance  standards  for 
TCAS  II  [G]. 


98 


APPENDIX  I 
MINI  TCAS 


A  simplified  version  of  TCAS,  called  mini  TCAS  in  this  report,  was  implemented  that  issues 
RAs  based  only  on  one  perfect  aircraft  state  defined  in  terms  of  /i,  r,  /^i,  and  /i2.  The  major 
assumptions  of  mini  TCAS  are: 


1.  The  intruder  is  not  TCAS-ecjuipped  but  is  reporting  altitude  and  is  under  perfect  surveillance. 

2.  The  horizontal  range  rate  is  — 500ft/s. 

3.  No  tracking  or  encounter  monitoring  is  performed.  Hence,  mini  TCAS  is  a  memory-less 
system. 

4.  Only  initial  RA  sense  and  strength  are  selected.  Thus,  no  strength  increases  or  reversals  are 
issued. 

5.  No  minimuiii  or  maximum  altitudes  are  enforced. 

G.  No  intruder  intent  information,  in  the  form  of  an  RA  coordination  message,  is  received. 

7.  The  tail-rising  test  and  horizontal  miss  distance  test  are  not  performed. 

The  various  constants  that  mini  TCAS  uses  are  listed  in  Table  I-l.  These  constants  will  l)e  discussed 
in  the  remainder  of  the  appendix. 


TABLE  I-l 

Constants  used  in  mini  TCAS. 


DMOD 

RDTHR 

TRTHR 

HI 

ZTHR 

TVTHR 

ALIM 


Threshold  that  defines  safety  buffer  around  own  aircraft  used  for  threat  detection 

Threshold  that  defines  converging  intruders 

Range  threshold  for  converging  intruders  in  the  range  test 

Threshold  for  determining  if  diverging  intruder  passes  the  range  test 

Altitude  threshold  for  threat  detection 

Time  threshold  for  time  until  co-altitude  in  the  altitude  test 
Altitude  threshold  for  RA  selection 


Given  h,  r,  /ii,  and  /i2,  mini  TCAS  calculates  the  slant  range,  r^,  and  the  slant  range  rate, 
r.s.  It  assumes  that  the  intruder  is  closing  horizontally  at  =  —500 ft/s.  Using  the  fact  that 
r  =  — r/i/r/j,  Vfi  >  0,  the  slant  range  is  given  by 

Ts  =  (1-1) 
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The  slant  range  rate  is 


.  _  f-hrh  +  hh 

wliere  h  =  /12  —  /ti- 


(1-2) 


Because  mini  TCAS  decides  which  RA,  if  any,  to  issue  based  solely  on  the  current  aircraft 
state,  the  critical  interval  was  approximated  by  [modified_tau_uncapped,  true_tau_uncapped].  If 
Ts  <  DMOD,  modified_tau_uncapped  is  zero,  indicating  the  critical  interval  starts  immediately 
because  the  own  aircraft  safety  buffer  defined  by  DMOD  has  already  been  violated.  Otherwise, 


modified_tau_capped  = - 

rs 


-  DMOD^ 
niin(f5,-RDTHR)’ 


(1-3) 


where  RDTHR  =  10  ft/s.  The  end  of  the  critical  interval,  true_tau_uncapped,  is  always  defined  as 


true  tau  uncapped  = - — ^ -  .  (1-4) 

“  “  niin(f5, -RDTHR)  ^  ^ 

If  the  intruder  is  diverging  {vg  >  d),  the  critical  interval  is  [0,0],  or  undefined. 

The  following  sections  describe  the  threat  detection,  sense  selection,  and  strength  selection 
components  of  mini  TCAS. 


I.l  THREAT  DETECTION 

The  intruder  is  declared  a  threat  if  and  only  if  it  passes  the  range  and  altitude  tests  and  fails  the 
altitude  separation  test. 

1.1.1  Range  Test 

The  first  step  in  the  threat  detection  process  is  the  range  test.  The  intruder  passes  the  range 
test  if  either  of  the  following  is  true. 

1.  (converging  intruder)  Vg  <  RDTHR  A  Tg  <  TRTHR  A  modified_tau_uncapped  <  TRTHR; 

2.  (diverging  intruder)  >  RDTHR  A  rg  <  DMOD  A  rgTg  <  HI, 

where  TRTHR  and  HI  are  thresholds  dependent  on  the  encounter  sensitivity  level.  Note  that  this 
is  a  simplification  of  the  true  range  test  in  that  the  taii-rising  test  and  the  horizontal  miss  distance 
test  are  not  performed. 

1.1.2  Altitude  Test 

If  there  is  already  insufficient  vertical  separation  between  the  aircraft,  i.e.,  if  \h\  <  ZTHR,  the 
intruder  passes  the  altitude  test  automatically  if  modified_tau_uncapped  is  zero.  Otherwise,  the 


100 


intruder  passes  the  test  if  the  projected  vertical  miss  distance,  vrnd.  during  the  critical  interval  is 
less  than  ZTHR.  The  projected  relative  altitudes  at  the  beginning  and  end  of  the  critical  interval, 
respectively,  are  determined  by  linear  extrapolation: 

/^beg  =  h  {112  —  *  modified_tau_uncapped,  (1-5) 

fhnd  =  h-\-  {h2  —  hi)  ’  true_tau_uncapped.  (1-6) 

If  the  aircraft  are  projected  to  cross  altitudes  during  the  critical  interval  (sign(/ibeg)  ^ign(/?end))» 
then  vmd  —  0.  Otherwise,  xmid  —  min(|/ibeg|)  l/^endD-  Consequently,  if  vmd  <  ZTHR,  the  intruder 
passes  the  altitude  test. 

If,  on  the  other  hand,  \}i\  >  ZTHR  and  there  is  some  finite  value  for  the  vertical  r,  or  time  to 
co-altitude,  the  intruder  passes  the  altitude  test  if  r  <  TVTHR.  The  vertical  r  is  finite  if  the  rate 
of  change  of  the  vertical  separation  magnitude  A  =  \h\  is  less  than  —1  ft/s.  That  is, 

A  =  (/i2  -  hi)s\gn{h)  <  -1  ft/s.  (1-7) 

If  >  —  Ift/s,  the  intruder  fails  the  altitude  test.  The  threshold  TVTHR  is  a  function  of  the 
sensitivity  level  and  dependent  on  whether  or  not  the  vertical  closure  rate  is  due  primarily  to  the 
intruder.  The  vertical  closure  rate  is  attributed  primarily  to  the  intruder  if  either  (1)  the  vertical 
rate  of  own  aircraft  is  less  than  600ft/min  or  (2)  both  aircraft  are  climbing  or  both  aircraft  are 
descending  and  \h2\  >  |/q|.  Otherwise,  the  vertical  closure  rate  is  not  primarily  due  to  the  intruder. 

Obviously,  if  \fi\  >  ZTHR  and  modified_tau_uncapped  =  0,  the  intruder  fails  the  altitude  test. 

1.1.3  Altitude  Separation  Test 

The  intruder  passes  the  altitude  separation  test  if  at  least  one  of  the  following  is  true. 

1.  \h\  >  600ft  A  {hi  =  0  V  /i2  =  0  V  sign(/q)  =  sign(/?/2)); 

2.  |/i|  >  850  ft  A  hi  7^  0  A  112  ^  0  A  sign(//i)  =  — sign(A2). 


1.2  SENSE  SELECTION 


Sense  selection  is  performed  if  and  only  if  the  intruder  is  declared  a  threat.  As  discussed  in 
Section  1.2,  the  response  to  both  upward-sense  (Climb)  and  downward-sense  (Descend)  RAs  is 
modeled  and  the  projected  vertical  separation  at  the  beginning  and  end  of  the  critical  interval,  /?beg 
and  /iend?  respectively,  is  calculated.  The  intruder’s  trajectory  is  modeled  as  a  straight  line  with 
a  constant  vertical  rate.  A  pilot  delay  model  is  implemented  for  own  aircraft  in  which  the  pilot 
requires  five  seconds  to  respond  to  the  RA.  After  the  pilot-response  delay,  the  own  aircraft  trajectory 
is  modeled  as  accelerating  at  0.25  g  (8  ft/s*^)  until  reaching  the  target  vertical  rate,  after  which  it 
maintains  that  rate  for  the  remainder  of  the  encounter.  The  target  vertical  rate  is  1500ft/min  for 
the  Climb  RA  and  —  1500ft/min  for  the  Descend  RA.  If  modified_tau_uncapped  =  0,  /?beg  =  h  and 
if  true_tau_uncapped  =  0,  /?end  =  h.  The  vertical  separation  during  the  critical  interval,  vnid^  is 


Dwd  = 


hheg  if  K^endl  ^  l^begh 

hend  otherwise. 


(1-8) 
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Let  i;md(Climb)  denote  vmd  for  the  Climb  RA  and  i;md(Descend)  denote  vmd  for  the  Descend  RA. 

If  the  aircraft  are  not  considered  to  be  co-altitude  {\h\  >  100ft),  then  the  downward  sense 
is  selected  if  the  upward-sense  RA  is  an  altitude-crossing  RA,  the  downward-sense  RA  is  not,  and 
the  downward-sense  RA  provides  at  least  ALIM  separation: 

sign(/i)  ^  sign(i;md(Climb))  Asign(/i)  =  sign(t;md(Descend))  A  |fmd(Descend)|  >  ALIM.  (1-9) 

The  threshold  ALIM  is  a  function  of  the  altitude  layer  of  own  aircraft.  Similarly,  the  upward  sense 
is  selected  if 

s\gn{h)  ^  sign(umd(Descend))  A  sign(/?)  =  sign(i;md(Climb))  A  |i;m(i(Climb)|  >  ALIM.  (I-IO) 
If  the  previous  criteria  do  not  hold,  then  the  upward  sense  is  selected  if 

|umd(Climb)|  >  |umd(Descend)|  (I-H) 

and  the  downward  sense  is  selected  otherwise. 


1.3  STRENGTH  SELECTION 

The  strength  selection  process  proceeds  by  first  calculating  the  vertical  miss  distance  during  the 
critical  interval  using  linear  extrapolation,  as  was  done  during  the  threat  detection  process  (Equa¬ 
tions  1-5  and  1-6).  Vertical  speed  limits  (VSLs)  are  not  modeled  if  at  least  one  of  the  following  is 
true. 

1.  I/V2I  <  lOOOft/min  A  |A|  <  ALIM  A  \v7nd{-)\  <  ALIM  A  |Ai|  <  600ft/min; 

2.  IA2I  >  lOOOft/min  A  \vmd{-)\  <  ALIM  A  |/ii|  <  600ft/min, 

where  VTnd{-)  is  the  vertical  separation  during  the  critical  interval,  i;md(Climb)  if  the  upward  sense 
is  selected,  fm(i(Descend)  if  the  downward  sense  is  selected.  Otherwise,  each  VSL  is  modeled  and 
the  vertical  separation  during  the  critical  interval  is  calculated  exactly  as  in  the  sense  selection 
process  (Equation  1-8). 

The  least  restrictive  VSL  that  provides  at  least  the  target  separation  is  selected  as  the  RA. 
The  target  separation  for  all  VSLs  except  Do  Not  Climb  and  Do  Not  Descend  is  ALIM  +  75  ft.  The 
target  separation  for  Do  Not  Climb  and  Do  Not  Descend  is  ALIM.  If  no  VSL  provides  at  least  the 
target  separation,  the  positive  RA  in  the  given  sense  is  selected. 

If  a  VSL  is  selected,  but  it  is  corrective,  i.e.,  results  in  own  aircraft  changing  its  vertical  rate, 
then  Do  Not  Climb  or  Do  Not  Descend  is  selected  instead,  depending  on  the  sense. 
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NOTATION 


n 

action 

b 

belief  state 

B 

Bellman  update  operator 

c{8,  a) 

cost  function 

h 

vertical  separation 

h, 

own  vertical  rate 

ii2 

intruder  vertical  rate 

J 

cost-to-go  function 

./* 

optimal  cost-to-go  function 

Pr(A) 

probability  of  alert 

Pr(C) 

probability  of  conflict 

TT 

policy 

7T* 

optimal  policy 

S 

state 

HA  state 

f 

time 

T{s'\s^a) 

transition  model 

T 

time  to  horizontal  conflict 

X 

state  (interpreted  as  a  vector) 
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